Members/tobaru/cbc/CbC_llvm: lib/Target/PowerPC/README

annotate lib/Target/PowerPC/README_ALTIVEC.txt @ 100:7d135dc70f03

LLVM 3.9

author	Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
date	Tue, 26 Jan 2016 22:53:40 +0900
parents	afa8332a0e37
children

rev	line source
0 95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	1 //===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	2
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	3 Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	4 registers, to generate better spill code.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	5
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	6 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	7
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	8 The first should be a single lvx from the constant pool, the second should be
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	9 a xor/stvx:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	10
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	11 void foo(void) {
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	12 int x[8] __attribute__((aligned(128))) = { 1, 1, 1, 17, 1, 1, 1, 1 };
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	13 bar (x);
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	14 }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	15
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	16 #include <string.h>
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	17 void foo(void) {
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	18 int x[8] __attribute__((aligned(128)));
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	19 memset (x, 0, sizeof (x));
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	20 bar (x);
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	21 }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	22
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	23 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	24
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	25 Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	26 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	27
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	28 When -ffast-math is on, we can use 0.0.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	29
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	30 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	31
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	32 Consider this:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	33 v4f32 Vector;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	34 v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X };
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	35
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	36 Since we know that "Vector" is 16-byte aligned and we know the element offset
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	37 of ".X", we should change the load into a lve*x instruction, instead of doing
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	38 a load/store/lve*x sequence.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	39
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	40 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	41
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	42 For functions that use altivec AND have calls, we are VRSAVE'ing all call
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	43 clobbered regs.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	44
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	45 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	46
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	47 Implement passing vectors by value into calls and receiving them as arguments.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	48
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	49 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	50
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	51 GCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	52 of C1/C2/C3, then a load and vperm of Variable.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	53
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	54 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	55
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	56 We need a way to teach tblgen that some operands of an intrinsic are required to
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	57 be constants. The verifier should enforce this constraint.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	58
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	59 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	60
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	61 We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	62 aligned stack slot, followed by a load/vperm. We should probably just store it
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	63 to a scalar stack slot, then use lvsl/vperm to load it. If the value is already
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	64 in memory this is a big win.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	65
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	66 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	67
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	68 extract_vector_elt of an arbitrary constant vector can be done with the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	69 following instructions:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	70
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	71 vTemp = vec_splat(v0,2); // 2 is the element the src is in.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	72 vec_ste(&destloc,0,vTemp);
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	73
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	74 We can do an arbitrary non-constant value by using lvsr/perm/ste.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	75
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	76 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	77
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	78 If we want to tie instruction selection into the scheduler, we can do some
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	79 constant formation with different instructions. For example, we can generate
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	80 "vsplti -1" with "vcmpequw R,R" and 1,1,1,1 with "vsubcuw R,R", and 0,0,0,0 with
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	81 "vsplti 0" or "vxor", each of which use different execution units, thus could
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	82 help scheduling.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	83
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	84 This is probably only reasonable for a post-pass scheduler.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	85
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	86 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	87
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	88 For this function:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	89
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	90 void test(vector float A, vector float B) {
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	91 vector float C = (vector float)vec_cmpeq(A, B);
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	92 if (!vec_any_eq(A, B))
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	93 *B = (vector float){0,0,0,0};
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	94 *A = C;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	95 }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	96
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	97 we get the following basic block:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	98
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	99 ...
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	100 lvx v2, 0, r4
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	101 lvx v3, 0, r3
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	102 vcmpeqfp v4, v3, v2
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	103 vcmpeqfp. v2, v3, v2
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	104 bne cr6, LBB1_2 ; cond_next
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	105
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	106 The vcmpeqfp/vcmpeqfp. instructions currently cannot be merged when the
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	107 vcmpeqfp. result is used by a branch. This can be improved.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	108
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	109 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	110
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	111 The code generated for this is truly aweful:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	112
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	113 vector float test(float a, float b) {
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	114 return (vector float){ 0.0, a, 0.0, 0.0};
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	115 }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	116
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	117 LCPI1_0: ; float
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	118 .space 4
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	119 .text
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	120 .globl _test
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	121 .align 4
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	122 _test:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	123 mfspr r2, 256
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	124 oris r3, r2, 4096
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	125 mtspr 256, r3
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	126 lis r3, ha16(LCPI1_0)
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	127 addi r4, r1, -32
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	128 stfs f1, -16(r1)
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	129 addi r5, r1, -16
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	130 lfs f0, lo16(LCPI1_0)(r3)
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	131 stfs f0, -32(r1)
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	132 lvx v2, 0, r4
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	133 lvx v3, 0, r5
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	134 vmrghw v3, v3, v2
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	135 vspltw v2, v2, 0
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	136 vmrghw v2, v2, v3
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	137 mtspr 256, r2
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	138 blr
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	139
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	140 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	141
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	142 int foo(vector float x, vector float y) {
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	143 if (vec_all_eq(x,y)) return 3245;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	144 else return 12;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	145 }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	146
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	147 A predicate compare being used in a select_cc should have the same peephole
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	148 applied to it as a predicate compare used by a br_cc. There should be no
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	149 mfcr here:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	150
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	151 _foo:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	152 mfspr r2, 256
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	153 oris r5, r2, 12288
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	154 mtspr 256, r5
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	155 li r5, 12
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	156 li r6, 3245
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	157 lvx v2, 0, r4
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	158 lvx v3, 0, r3
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	159 vcmpeqfp. v2, v3, v2
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	160 mfcr r3, 2
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	161 rlwinm r3, r3, 25, 31, 31
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	162 cmpwi cr0, r3, 0
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	163 bne cr0, LBB1_2 ; entry
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	164 LBB1_1: ; entry
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	165 mr r6, r5
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	166 LBB1_2: ; entry
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	167 mr r3, r6
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	168 mtspr 256, r2
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	169 blr
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	170
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	171 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	172
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	173 CodeGen/PowerPC/vec_constants.ll has an and operation that should be
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	174 codegen'd to andc. The issue is that the 'all ones' build vector is
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	175 SelectNodeTo'd a VSPLTISB instruction node before the and/xor is selected
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	176 which prevents the vnot pattern from matching.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	177
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	178
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	179 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	180
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	181 An alternative to the store/store/load approach for illegal insert element
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	182 lowering would be:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	183
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	184 1. store element to any ol' slot
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	185 2. lvx the slot
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	186 3. lvsl 0; splat index; vcmpeq to generate a select mask
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	187 4. lvsl slot + x; vperm to rotate result into correct slot
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	188 5. vsel result together.
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	189
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	190 //===----------------------------------------------------------------------===//
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	191
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	192 Should codegen branches on vec_any/vec_all to avoid mfcr. Two examples:
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	193
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	194 #include <altivec.h>
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	195 int f(vector float a, vector float b)
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	196 {
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	197 int aa = 0;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	198 if (vec_all_ge(a, b))
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	199 aa \|= 0x1;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	200 if (vec_any_ge(a,b))
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	201 aa \|= 0x2;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	202 return aa;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	203 }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	204
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	205 vector float f(vector float a, vector float b) {
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	206 if (vec_any_eq(a, b))
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	207 return a;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	208 else
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	209 return b;
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	210 }
95c75e76d11b LLVM 3.4 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: diff changeset	211
95 afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	212 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	213
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	214 We should do a little better with eliminating dead stores.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	215 The stores to the stack are dead since %a and %b are not needed
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	216
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	217 ; Function Attrs: nounwind
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	218 define <16 x i8> @test_vpmsumb() #0 {
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	219 entry:
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	220 %a = alloca <16 x i8>, align 16
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	221 %b = alloca <16 x i8>, align 16
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	222 store <16 x i8> <i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16>, <16 x i8>* %a, align 16
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	223 store <16 x i8> <i8 113, i8 114, i8 115, i8 116, i8 117, i8 118, i8 119, i8 120, i8 121, i8 122, i8 123, i8 124, i8 125, i8 126, i8 127, i8 112>, <16 x i8>* %b, align 16
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	224 %0 = load <16 x i8>* %a, align 16
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	225 %1 = load <16 x i8>* %b, align 16
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	226 %2 = call <16 x i8> @llvm.ppc.altivec.crypto.vpmsumb(<16 x i8> %0, <16 x i8> %1)
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	227 ret <16 x i8> %2
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	228 }
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	229
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	230
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	231 ; Function Attrs: nounwind readnone
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	232 declare <16 x i8> @llvm.ppc.altivec.crypto.vpmsumb(<16 x i8>, <16 x i8>) #1
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	233
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	234
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	235 Produces the following code with -mtriple=powerpc64-unknown-linux-gnu:
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	236 # BB#0: # %entry
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	237 addis 3, 2, .LCPI0_0@toc@ha
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	238 addis 4, 2, .LCPI0_1@toc@ha
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	239 addi 3, 3, .LCPI0_0@toc@l
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	240 addi 4, 4, .LCPI0_1@toc@l
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	241 lxvw4x 0, 0, 3
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	242 addi 3, 1, -16
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	243 lxvw4x 35, 0, 4
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	244 stxvw4x 0, 0, 3
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	245 ori 2, 2, 0
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	246 lxvw4x 34, 0, 3
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	247 addi 3, 1, -32
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	248 stxvw4x 35, 0, 3
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	249 vpmsumb 2, 2, 3
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	250 blr
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	251 .long 0
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	252 .quad 0
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	253
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	254 The two stxvw4x instructions are not needed.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	255 With -mtriple=powerpc64le-unknown-linux-gnu, the associated permutes
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	256 are present too.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	257
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	258 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	259
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	260 The following example is found in test/CodeGen/PowerPC/vec_add_sub_doubleword.ll:
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	261
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	262 define <2 x i64> @increment_by_val(<2 x i64> %x, i64 %val) nounwind {
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	263 %tmpvec = insertelement <2 x i64> <i64 0, i64 0>, i64 %val, i32 0
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	264 %tmpvec2 = insertelement <2 x i64> %tmpvec, i64 %val, i32 1
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	265 %result = add <2 x i64> %x, %tmpvec2
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	266 ret <2 x i64> %result
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	267
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	268 This will generate the following instruction sequence:
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	269 std 5, -8(1)
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	270 std 5, -16(1)
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	271 addi 3, 1, -16
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	272 ori 2, 2, 0
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	273 lxvd2x 35, 0, 3
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	274 vaddudm 2, 2, 3
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	275 blr
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	276
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	277 This will almost certainly cause a load-hit-store hazard.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	278 Since val is a value parameter, it should not need to be saved onto
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	279 the stack, unless it's being done set up the vector register. Instead,
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	280 it would be better to splat the value into a vector register, and then
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	281 remove the (dead) stores to the stack.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	282
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	283 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	284
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	285 At the moment we always generate a lxsdx in preference to lfd, or stxsdx in
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	286 preference to stfd. When we have a reg-immediate addressing mode, this is a
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	287 poor choice, since we have to load the address into an index register. This
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	288 should be fixed for P7/P8.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	289
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	290 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	291
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	292 Right now, ShuffleKind 0 is supported only on BE, and ShuffleKind 2 only on LE.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	293 However, we could actually support both kinds on either endianness, if we check
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	294 for the appropriate shufflevector pattern for each case ... this would cause
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	295 some additional shufflevectors to be recognized and implemented via the
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	296 "swapped" form.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	297
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	298 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	299
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	300 There is a utility program called PerfectShuffle that generates a table of the
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	301 shortest instruction sequence for implementing a shufflevector operation on
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	302 PowerPC. However, this was designed for big-endian code generation. We could
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	303 modify this program to create a little endian version of the table. The table
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	304 is used in PPCISelLowering.cpp, PPCTargetLowering::LOWERVECTOR_SHUFFLE().
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	305
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	306 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	307
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	308 Opportunies to use instructions from PPCInstrVSX.td during code gen
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	309 - Conversion instructions (Sections 7.6.1.5 and 7.6.1.6 of ISA 2.07)
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	310 - Scalar comparisons (xscmpodp and xscmpudp)
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	311 - Min and max (xsmaxdp, xsmindp, xvmaxdp, xvmindp, xvmaxsp, xvminsp)
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	312
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	313 Related to this: we currently do not generate the lxvw4x instruction for either
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	314 v4f32 or v4i32, probably because adding a dag pattern to the recognizer requires
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	315 a single target type. This should probably be addressed in the PPCISelDAGToDAG logic.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	316
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	317 //===----------------------------------------------------------------------===//
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	318
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	319 Currently EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT are type-legal only
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	320 for v2f64 with VSX available. We should create custom lowering
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	321 support for the other vector types. Without this support, we generate
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	322 sequences with load-hit-store hazards.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	323
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	324 v4f32 can be supported with VSX by shifting the correct element into
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	325 big-endian lane 0, using xscvspdpn to produce a double-precision
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	326 representation of the single-precision value in big-endian
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	327 double-precision lane 0, and reinterpreting lane 0 as an FPR or
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	328 vector-scalar register.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	329
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	330 v2i64 can be supported with VSX and P8Vector in the same manner as
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	331 v2f64, followed by a direct move to a GPR.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	332
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	333 v4i32 can be supported with VSX and P8Vector by shifting the correct
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	334 element into big-endian lane 1, using a direct move to a GPR, and
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	335 sign-extending the 32-bit result to 64 bits.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	336
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	337 v8i16 can be supported with VSX and P8Vector by shifting the correct
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	338 element into big-endian lane 3, using a direct move to a GPR, and
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	339 sign-extending the 16-bit result to 64 bits.
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	340
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	341 v16i8 can be supported with VSX and P8Vector by shifting the correct
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	342 element into big-endian lane 7, using a direct move to a GPR, and
afa8332a0e37 LLVM 3.8 Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> parents: 0 diff changeset	343 sign-extending the 8-bit result to 64 bits.

Mercurial > hg > Members > tobaru > cbc > CbC_llvm

annotate lib/Target/PowerPC/README_ALTIVEC.txt @ 100:7d135dc70f03