Mercurial > hg > CbC > CbC_llvm
comparison lib/Target/SystemZ/README.txt @ 0:95c75e76d11b LLVM3.4
LLVM 3.4
author | Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> |
---|---|
date | Thu, 12 Dec 2013 13:56:28 +0900 |
parents | |
children | e4204d083e25 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:95c75e76d11b |
---|---|
1 //===---------------------------------------------------------------------===// | |
2 // Random notes about and ideas for the SystemZ backend. | |
3 //===---------------------------------------------------------------------===// | |
4 | |
5 The initial backend is deliberately restricted to z10. We should add support | |
6 for later architectures at some point. | |
7 | |
8 -- | |
9 | |
10 SystemZDAGToDAGISel::SelectInlineAsmMemoryOperand() is passed "m" for all | |
11 inline asm memory constraints; it doesn't get to see the original constraint. | |
12 This means that it must conservatively treat all inline asm constraints | |
13 as the most restricted type, "R". | |
14 | |
15 -- | |
16 | |
17 If an inline asm ties an i32 "r" result to an i64 input, the input | |
18 will be treated as an i32, leaving the upper bits uninitialised. | |
19 For example: | |
20 | |
21 define void @f4(i32 *%dst) { | |
22 %val = call i32 asm "blah $0", "=r,0" (i64 103) | |
23 store i32 %val, i32 *%dst | |
24 ret void | |
25 } | |
26 | |
27 from CodeGen/SystemZ/asm-09.ll will use LHI rather than LGHI. | |
28 to load 103. This seems to be a general target-independent problem. | |
29 | |
30 -- | |
31 | |
32 The tuning of the choice between LOAD ADDRESS (LA) and addition in | |
33 SystemZISelDAGToDAG.cpp is suspect. It should be tweaked based on | |
34 performance measurements. | |
35 | |
36 -- | |
37 | |
38 There is no scheduling support. | |
39 | |
40 -- | |
41 | |
42 We don't use the BRANCH ON INDEX instructions. | |
43 | |
44 -- | |
45 | |
46 We might want to use BRANCH ON CONDITION for conditional indirect calls | |
47 and conditional returns. | |
48 | |
49 -- | |
50 | |
51 We don't use the TEST DATA CLASS instructions. | |
52 | |
53 -- | |
54 | |
55 We could use the generic floating-point forms of LOAD COMPLEMENT, | |
56 LOAD NEGATIVE and LOAD POSITIVE in cases where we don't need the | |
57 condition codes. For example, we could use LCDFR instead of LCDBR. | |
58 | |
59 -- | |
60 | |
61 We only use MVC, XC and CLC for constant-length block operations. | |
62 We could extend them to variable-length operations too, | |
63 using EXECUTE RELATIVE LONG. | |
64 | |
65 MVCIN, MVCLE and CLCLE may be worthwhile too. | |
66 | |
67 -- | |
68 | |
69 We don't use CUSE or the TRANSLATE family of instructions for string | |
70 operations. The TRANSLATE ones are probably more difficult to exploit. | |
71 | |
72 -- | |
73 | |
74 We don't take full advantage of builtins like fabsl because the calling | |
75 conventions require f128s to be returned by invisible reference. | |
76 | |
77 -- | |
78 | |
79 ADD LOGICAL WITH SIGNED IMMEDIATE could be useful when we need to | |
80 produce a carry. SUBTRACT LOGICAL IMMEDIATE could be useful when we | |
81 need to produce a borrow. (Note that there are no memory forms of | |
82 ADD LOGICAL WITH CARRY and SUBTRACT LOGICAL WITH BORROW, so the high | |
83 part of 128-bit memory operations would probably need to be done | |
84 via a register.) | |
85 | |
86 -- | |
87 | |
88 We don't use the halfword forms of LOAD REVERSED and STORE REVERSED | |
89 (LRVH and STRVH). | |
90 | |
91 -- | |
92 | |
93 We don't use ICM or STCM. | |
94 | |
95 -- | |
96 | |
97 DAGCombiner doesn't yet fold truncations of extended loads. Functions like: | |
98 | |
99 unsigned long f (unsigned long x, unsigned short *y) | |
100 { | |
101 return (x << 32) | *y; | |
102 } | |
103 | |
104 therefore end up as: | |
105 | |
106 sllg %r2, %r2, 32 | |
107 llgh %r0, 0(%r3) | |
108 lr %r2, %r0 | |
109 br %r14 | |
110 | |
111 but truncating the load would give: | |
112 | |
113 sllg %r2, %r2, 32 | |
114 lh %r2, 0(%r3) | |
115 br %r14 | |
116 | |
117 -- | |
118 | |
119 Functions like: | |
120 | |
121 define i64 @f1(i64 %a) { | |
122 %and = and i64 %a, 1 | |
123 ret i64 %and | |
124 } | |
125 | |
126 ought to be implemented as: | |
127 | |
128 lhi %r0, 1 | |
129 ngr %r2, %r0 | |
130 br %r14 | |
131 | |
132 but two-address optimisations reverse the order of the AND and force: | |
133 | |
134 lhi %r0, 1 | |
135 ngr %r0, %r2 | |
136 lgr %r2, %r0 | |
137 br %r14 | |
138 | |
139 CodeGen/SystemZ/and-04.ll has several examples of this. | |
140 | |
141 -- | |
142 | |
143 Out-of-range displacements are usually handled by loading the full | |
144 address into a register. In many cases it would be better to create | |
145 an anchor point instead. E.g. for: | |
146 | |
147 define void @f4a(i128 *%aptr, i64 %base) { | |
148 %addr = add i64 %base, 524288 | |
149 %bptr = inttoptr i64 %addr to i128 * | |
150 %a = load volatile i128 *%aptr | |
151 %b = load i128 *%bptr | |
152 %add = add i128 %a, %b | |
153 store i128 %add, i128 *%aptr | |
154 ret void | |
155 } | |
156 | |
157 (from CodeGen/SystemZ/int-add-08.ll) we load %base+524288 and %base+524296 | |
158 into separate registers, rather than using %base+524288 as a base for both. | |
159 | |
160 -- | |
161 | |
162 Dynamic stack allocations round the size to 8 bytes and then allocate | |
163 that rounded amount. It would be simpler to subtract the unrounded | |
164 size from the copy of the stack pointer and then align the result. | |
165 See CodeGen/SystemZ/alloca-01.ll for an example. | |
166 | |
167 -- | |
168 | |
169 Atomic loads and stores use the default compare-and-swap based implementation. | |
170 This is much too conservative in practice, since the architecture guarantees | |
171 that 1-, 2-, 4- and 8-byte loads and stores to aligned addresses are | |
172 inherently atomic. | |
173 | |
174 -- | |
175 | |
176 If needed, we can support 16-byte atomics using LPQ, STPQ and CSDG. | |
177 | |
178 -- | |
179 | |
180 We might want to model all access registers and use them to spill | |
181 32-bit values. |