annotate lib/Target/PowerPC/README_ALTIVEC.txt @ 100:7d135dc70f03

LLVM 3.9
author Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
date Tue, 26 Jan 2016 22:53:40 +0900
parents afa8332a0e37
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 //===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
2
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
3 Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
4 registers, to generate better spill code.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
5
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
6 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
7
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
8 The first should be a single lvx from the constant pool, the second should be
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
9 a xor/stvx:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
10
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
11 void foo(void) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
12 int x[8] __attribute__((aligned(128))) = { 1, 1, 1, 17, 1, 1, 1, 1 };
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
13 bar (x);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
14 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
15
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
16 #include <string.h>
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
17 void foo(void) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
18 int x[8] __attribute__((aligned(128)));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
19 memset (x, 0, sizeof (x));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
20 bar (x);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
21 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
22
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
23 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
24
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
25 Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
26 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
27
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
28 When -ffast-math is on, we can use 0.0.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
29
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
30 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
31
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
32 Consider this:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
33 v4f32 Vector;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
34 v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X };
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
35
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
36 Since we know that "Vector" is 16-byte aligned and we know the element offset
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
37 of ".X", we should change the load into a lve*x instruction, instead of doing
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
38 a load/store/lve*x sequence.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
39
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
40 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
41
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
42 For functions that use altivec AND have calls, we are VRSAVE'ing all call
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
43 clobbered regs.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
44
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
45 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
46
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
47 Implement passing vectors by value into calls and receiving them as arguments.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
48
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
49 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
50
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
51 GCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
52 of C1/C2/C3, then a load and vperm of Variable.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
53
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
54 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
55
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
56 We need a way to teach tblgen that some operands of an intrinsic are required to
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
57 be constants. The verifier should enforce this constraint.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
58
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
59 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
60
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
61 We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
62 aligned stack slot, followed by a load/vperm. We should probably just store it
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
63 to a scalar stack slot, then use lvsl/vperm to load it. If the value is already
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
64 in memory this is a big win.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
65
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
66 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
67
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
68 extract_vector_elt of an arbitrary constant vector can be done with the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
69 following instructions:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
70
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
71 vTemp = vec_splat(v0,2); // 2 is the element the src is in.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
72 vec_ste(&destloc,0,vTemp);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
73
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
74 We can do an arbitrary non-constant value by using lvsr/perm/ste.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
75
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
76 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
77
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
78 If we want to tie instruction selection into the scheduler, we can do some
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
79 constant formation with different instructions. For example, we can generate
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
80 "vsplti -1" with "vcmpequw R,R" and 1,1,1,1 with "vsubcuw R,R", and 0,0,0,0 with
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
81 "vsplti 0" or "vxor", each of which use different execution units, thus could
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
82 help scheduling.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
83
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
84 This is probably only reasonable for a post-pass scheduler.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
85
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
86 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
87
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
88 For this function:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
89
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
90 void test(vector float *A, vector float *B) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
91 vector float C = (vector float)vec_cmpeq(*A, *B);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
92 if (!vec_any_eq(*A, *B))
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
93 *B = (vector float){0,0,0,0};
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
94 *A = C;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
95 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
96
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
97 we get the following basic block:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
98
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
99 ...
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
100 lvx v2, 0, r4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
101 lvx v3, 0, r3
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
102 vcmpeqfp v4, v3, v2
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
103 vcmpeqfp. v2, v3, v2
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
104 bne cr6, LBB1_2 ; cond_next
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
105
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
106 The vcmpeqfp/vcmpeqfp. instructions currently cannot be merged when the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
107 vcmpeqfp. result is used by a branch. This can be improved.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
108
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
109 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
110
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
111 The code generated for this is truly aweful:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
112
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
113 vector float test(float a, float b) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
114 return (vector float){ 0.0, a, 0.0, 0.0};
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
115 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
116
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
117 LCPI1_0: ; float
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
118 .space 4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
119 .text
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
120 .globl _test
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
121 .align 4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
122 _test:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
123 mfspr r2, 256
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
124 oris r3, r2, 4096
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
125 mtspr 256, r3
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
126 lis r3, ha16(LCPI1_0)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
127 addi r4, r1, -32
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
128 stfs f1, -16(r1)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
129 addi r5, r1, -16
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
130 lfs f0, lo16(LCPI1_0)(r3)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
131 stfs f0, -32(r1)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
132 lvx v2, 0, r4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
133 lvx v3, 0, r5
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
134 vmrghw v3, v3, v2
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
135 vspltw v2, v2, 0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
136 vmrghw v2, v2, v3
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
137 mtspr 256, r2
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
138 blr
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
139
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
140 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
141
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
142 int foo(vector float *x, vector float *y) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
143 if (vec_all_eq(*x,*y)) return 3245;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
144 else return 12;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
145 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
146
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
147 A predicate compare being used in a select_cc should have the same peephole
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
148 applied to it as a predicate compare used by a br_cc. There should be no
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
149 mfcr here:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
150
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
151 _foo:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
152 mfspr r2, 256
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
153 oris r5, r2, 12288
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
154 mtspr 256, r5
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
155 li r5, 12
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
156 li r6, 3245
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
157 lvx v2, 0, r4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
158 lvx v3, 0, r3
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
159 vcmpeqfp. v2, v3, v2
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
160 mfcr r3, 2
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
161 rlwinm r3, r3, 25, 31, 31
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
162 cmpwi cr0, r3, 0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
163 bne cr0, LBB1_2 ; entry
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
164 LBB1_1: ; entry
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
165 mr r6, r5
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
166 LBB1_2: ; entry
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
167 mr r3, r6
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
168 mtspr 256, r2
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
169 blr
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
170
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
171 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
172
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
173 CodeGen/PowerPC/vec_constants.ll has an and operation that should be
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
174 codegen'd to andc. The issue is that the 'all ones' build vector is
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
175 SelectNodeTo'd a VSPLTISB instruction node before the and/xor is selected
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
176 which prevents the vnot pattern from matching.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
177
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
178
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
179 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
180
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
181 An alternative to the store/store/load approach for illegal insert element
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
182 lowering would be:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
183
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
184 1. store element to any ol' slot
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
185 2. lvx the slot
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
186 3. lvsl 0; splat index; vcmpeq to generate a select mask
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
187 4. lvsl slot + x; vperm to rotate result into correct slot
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
188 5. vsel result together.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
189
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
190 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
191
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
192 Should codegen branches on vec_any/vec_all to avoid mfcr. Two examples:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
193
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
194 #include <altivec.h>
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
195 int f(vector float a, vector float b)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
196 {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
197 int aa = 0;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
198 if (vec_all_ge(a, b))
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
199 aa |= 0x1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
200 if (vec_any_ge(a,b))
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
201 aa |= 0x2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
202 return aa;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
203 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
204
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
205 vector float f(vector float a, vector float b) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
206 if (vec_any_eq(a, b))
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
207 return a;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
208 else
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
209 return b;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
210 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
211
95
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
212 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
213
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
214 We should do a little better with eliminating dead stores.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
215 The stores to the stack are dead since %a and %b are not needed
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
216
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
217 ; Function Attrs: nounwind
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
218 define <16 x i8> @test_vpmsumb() #0 {
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
219 entry:
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
220 %a = alloca <16 x i8>, align 16
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
221 %b = alloca <16 x i8>, align 16
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
222 store <16 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16>, <16 x i8>* %a, align 16
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
223 store <16 x i8> <i8 113, i8 114, i8 115, i8 116, i8 117, i8 118, i8 119, i8 120, i8 121, i8 122, i8 123, i8 124, i8 125, i8 126, i8 127, i8 112>, <16 x i8>* %b, align 16
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
224 %0 = load <16 x i8>* %a, align 16
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
225 %1 = load <16 x i8>* %b, align 16
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
226 %2 = call <16 x i8> @llvm.ppc.altivec.crypto.vpmsumb(<16 x i8> %0, <16 x i8> %1)
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
227 ret <16 x i8> %2
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
228 }
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
229
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
230
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
231 ; Function Attrs: nounwind readnone
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
232 declare <16 x i8> @llvm.ppc.altivec.crypto.vpmsumb(<16 x i8>, <16 x i8>) #1
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
233
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
234
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
235 Produces the following code with -mtriple=powerpc64-unknown-linux-gnu:
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
236 # BB#0: # %entry
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
237 addis 3, 2, .LCPI0_0@toc@ha
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
238 addis 4, 2, .LCPI0_1@toc@ha
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
239 addi 3, 3, .LCPI0_0@toc@l
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
240 addi 4, 4, .LCPI0_1@toc@l
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
241 lxvw4x 0, 0, 3
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
242 addi 3, 1, -16
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
243 lxvw4x 35, 0, 4
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
244 stxvw4x 0, 0, 3
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
245 ori 2, 2, 0
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
246 lxvw4x 34, 0, 3
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
247 addi 3, 1, -32
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
248 stxvw4x 35, 0, 3
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
249 vpmsumb 2, 2, 3
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
250 blr
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
251 .long 0
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
252 .quad 0
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
253
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
254 The two stxvw4x instructions are not needed.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
255 With -mtriple=powerpc64le-unknown-linux-gnu, the associated permutes
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
256 are present too.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
257
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
258 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
259
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
260 The following example is found in test/CodeGen/PowerPC/vec_add_sub_doubleword.ll:
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
261
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
262 define <2 x i64> @increment_by_val(<2 x i64> %x, i64 %val) nounwind {
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
263 %tmpvec = insertelement <2 x i64> <i64 0, i64 0>, i64 %val, i32 0
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
264 %tmpvec2 = insertelement <2 x i64> %tmpvec, i64 %val, i32 1
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
265 %result = add <2 x i64> %x, %tmpvec2
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
266 ret <2 x i64> %result
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
267
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
268 This will generate the following instruction sequence:
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
269 std 5, -8(1)
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
270 std 5, -16(1)
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
271 addi 3, 1, -16
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
272 ori 2, 2, 0
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
273 lxvd2x 35, 0, 3
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
274 vaddudm 2, 2, 3
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
275 blr
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
276
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
277 This will almost certainly cause a load-hit-store hazard.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
278 Since val is a value parameter, it should not need to be saved onto
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
279 the stack, unless it's being done set up the vector register. Instead,
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
280 it would be better to splat the value into a vector register, and then
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
281 remove the (dead) stores to the stack.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
282
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
283 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
284
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
285 At the moment we always generate a lxsdx in preference to lfd, or stxsdx in
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
286 preference to stfd. When we have a reg-immediate addressing mode, this is a
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
287 poor choice, since we have to load the address into an index register. This
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
288 should be fixed for P7/P8.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
289
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
290 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
291
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
292 Right now, ShuffleKind 0 is supported only on BE, and ShuffleKind 2 only on LE.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
293 However, we could actually support both kinds on either endianness, if we check
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
294 for the appropriate shufflevector pattern for each case ... this would cause
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
295 some additional shufflevectors to be recognized and implemented via the
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
296 "swapped" form.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
297
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
298 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
299
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
300 There is a utility program called PerfectShuffle that generates a table of the
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
301 shortest instruction sequence for implementing a shufflevector operation on
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
302 PowerPC. However, this was designed for big-endian code generation. We could
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
303 modify this program to create a little endian version of the table. The table
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
304 is used in PPCISelLowering.cpp, PPCTargetLowering::LOWERVECTOR_SHUFFLE().
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
305
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
306 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
307
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
308 Opportunies to use instructions from PPCInstrVSX.td during code gen
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
309 - Conversion instructions (Sections 7.6.1.5 and 7.6.1.6 of ISA 2.07)
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
310 - Scalar comparisons (xscmpodp and xscmpudp)
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
311 - Min and max (xsmaxdp, xsmindp, xvmaxdp, xvmindp, xvmaxsp, xvminsp)
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
312
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
313 Related to this: we currently do not generate the lxvw4x instruction for either
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
314 v4f32 or v4i32, probably because adding a dag pattern to the recognizer requires
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
315 a single target type. This should probably be addressed in the PPCISelDAGToDAG logic.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
316
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
317 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
318
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
319 Currently EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT are type-legal only
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
320 for v2f64 with VSX available. We should create custom lowering
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
321 support for the other vector types. Without this support, we generate
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
322 sequences with load-hit-store hazards.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
323
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
324 v4f32 can be supported with VSX by shifting the correct element into
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
325 big-endian lane 0, using xscvspdpn to produce a double-precision
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
326 representation of the single-precision value in big-endian
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
327 double-precision lane 0, and reinterpreting lane 0 as an FPR or
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
328 vector-scalar register.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
329
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
330 v2i64 can be supported with VSX and P8Vector in the same manner as
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
331 v2f64, followed by a direct move to a GPR.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
332
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
333 v4i32 can be supported with VSX and P8Vector by shifting the correct
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
334 element into big-endian lane 1, using a direct move to a GPR, and
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
335 sign-extending the 32-bit result to 64 bits.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
336
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
337 v8i16 can be supported with VSX and P8Vector by shifting the correct
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
338 element into big-endian lane 3, using a direct move to a GPR, and
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
339 sign-extending the 16-bit result to 64 bits.
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
340
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
341 v16i8 can be supported with VSX and P8Vector by shifting the correct
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
342 element into big-endian lane 7, using a direct move to a GPR, and
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
343 sign-extending the 8-bit result to 64 bits.