Mercurial > hg > CbC > CbC_llvm
comparison docs/AMDGPUUsage.rst @ 95:afa8332a0e37 LLVM3.8
LLVM 3.8
author | Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> |
---|---|
date | Tue, 13 Oct 2015 17:48:58 +0900 |
parents | |
children | 1172e4bd9c6f |
comparison
equal
deleted
inserted
replaced
84:f3e34b893a5f | 95:afa8332a0e37 |
---|---|
1 ============================== | |
2 User Guide for AMDGPU Back-end | |
3 ============================== | |
4 | |
5 Introduction | |
6 ============ | |
7 | |
8 The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with | |
9 the R600 family up until the current Volcanic Islands (GCN Gen 3). | |
10 | |
11 | |
12 Assembler | |
13 ========= | |
14 | |
15 The assembler is currently considered experimental. | |
16 | |
17 For syntax examples look in test/MC/AMDGPU. | |
18 | |
19 Below some of the currently supported features (modulo bugs). These | |
20 all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands | |
21 are also supported but may be missing some instructions and have more bugs: | |
22 | |
23 DS Instructions | |
24 --------------- | |
25 All DS instructions are supported. | |
26 | |
27 FLAT Instructions | |
28 ------------------ | |
29 These instructions are only present in the Sea Islands and Volcanic Islands | |
30 instruction set. All FLAT instructions are supported for these architectures | |
31 | |
32 MUBUF Instructions | |
33 ------------------ | |
34 All non-atomic MUBUF instructions are supported. | |
35 | |
36 SMRD Instructions | |
37 ----------------- | |
38 Only the s_load_dword* SMRD instructions are supported. | |
39 | |
40 SOP1 Instructions | |
41 ----------------- | |
42 All SOP1 instructions are supported. | |
43 | |
44 SOP2 Instructions | |
45 ----------------- | |
46 All SOP2 instructions are supported. | |
47 | |
48 SOPC Instructions | |
49 ----------------- | |
50 All SOPC instructions are supported. | |
51 | |
52 SOPP Instructions | |
53 ----------------- | |
54 | |
55 Unless otherwise mentioned, all SOPP instructions that have one or more | |
56 operands accept integer operands only. No verification is performed | |
57 on the operands, so it is up to the programmer to be familiar with the | |
58 range or acceptable values. | |
59 | |
60 s_waitcnt | |
61 ^^^^^^^^^ | |
62 | |
63 s_waitcnt accepts named arguments to specify which memory counter(s) to | |
64 wait for. | |
65 | |
66 .. code-block:: nasm | |
67 | |
68 // Wait for all counters to be 0 | |
69 s_waitcnt 0 | |
70 | |
71 // Equivalent to s_waitcnt 0. Counter names can also be delimited by | |
72 // '&' or ','. | |
73 s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0) | |
74 | |
75 // Wait for vmcnt counter to be 1. | |
76 s_waitcnt vmcnt(1) | |
77 | |
78 VOP1, VOP2, VOP3, VOPC Instructions | |
79 ----------------------------------- | |
80 | |
81 All 32-bit and 64-bit encodings should work. | |
82 | |
83 The assembler will automatically detect which encoding size to use for | |
84 VOP1, VOP2, and VOPC instructions based on the operands. If you want to force | |
85 a specific encoding size, you can add an _e32 (for 32-bit encoding) or | |
86 _e64 (for 64-bit encoding) suffix to the instruction. Most, but not all | |
87 instructions support an explicit suffix. These are all valid assembly | |
88 strings: | |
89 | |
90 .. code-block:: nasm | |
91 | |
92 v_mul_i32_i24 v1, v2, v3 | |
93 v_mul_i32_i24_e32 v1, v2, v3 | |
94 v_mul_i32_i24_e64 v1, v2, v3 | |
95 | |
96 Assembler Directives | |
97 -------------------- | |
98 | |
99 .hsa_code_object_version major, minor | |
100 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
101 | |
102 *major* and *minor* are integers that specify the version of the HSA code | |
103 object that will be generated by the assembler. This value will be stored | |
104 in an entry of the .note section. | |
105 | |
106 .hsa_code_object_isa [major, minor, stepping, vendor, arch] | |
107 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
108 | |
109 *major*, *minor*, and *stepping* are all integers that describe the instruction | |
110 set architecture (ISA) version of the assembly program. | |
111 | |
112 *vendor* and *arch* are quoted strings. *vendor* should always be equal to | |
113 "AMD" and *arch* should always be equal to "AMDGPU". | |
114 | |
115 If no arguments are specified, then the assembler will derive the ISA version, | |
116 *vendor*, and *arch* from the value of the -mcpu option that is passed to the | |
117 assembler. | |
118 | |
119 ISA version, *vendor*, and *arch* will all be stored in a single entry of the | |
120 .note section. | |
121 | |
122 .amd_kernel_code_t | |
123 ^^^^^^^^^^^^^^^^^^ | |
124 | |
125 This directive marks the beginning of a list of key / value pairs that are used | |
126 to specify the amd_kernel_code_t object that will be emitted by the assembler. | |
127 The list must be terminated by the *.end_amd_kernel_code_t* directive. For | |
128 any amd_kernel_code_t values that are unspecified a default value will be | |
129 used. The default value for all keys is 0, with the following exceptions: | |
130 | |
131 - *kernel_code_version_major* defaults to 1. | |
132 - *machine_kind* defaults to 1. | |
133 - *machine_version_major*, *machine_version_minor*, and | |
134 *machine_version_stepping* are derived from the value of the -mcpu option | |
135 that is passed to the assembler. | |
136 - *kernel_code_entry_byte_offset* defaults to 256. | |
137 - *wavefront_size* defaults to 6. | |
138 - *kernarg_segment_alignment*, *group_segment_alignment*, and | |
139 *private_segment_alignment* default to 4. Note that alignments are specified | |
140 as a power of two, so a value of **n** means an alignment of 2^ **n**. | |
141 | |
142 The *.amd_kernel_code_t* directive must be placed immediately after the | |
143 function label and before any instructions. | |
144 | |
145 For a full list of amd_kernel_code_t keys, see the examples in | |
146 test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different | |
147 keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h | |
148 | |
149 Here is an example of a minimal amd_kernel_code_t specification: | |
150 | |
151 .. code-block:: nasm | |
152 | |
153 .hsa_code_object_version 1,0 | |
154 .hsa_code_object_isa | |
155 | |
156 .text | |
157 | |
158 hello_world: | |
159 | |
160 .amd_kernel_code_t | |
161 enable_sgpr_kernarg_segment_ptr = 1 | |
162 is_ptr64 = 1 | |
163 compute_pgm_rsrc1_vgprs = 0 | |
164 compute_pgm_rsrc1_sgprs = 0 | |
165 compute_pgm_rsrc2_user_sgpr = 2 | |
166 kernarg_segment_byte_size = 8 | |
167 wavefront_sgpr_count = 2 | |
168 workitem_vgpr_count = 3 | |
169 .end_amd_kernel_code_t | |
170 | |
171 s_load_dwordx2 s[0:1], s[0:1] 0x0 | |
172 v_mov_b32 v0, 3.14159 | |
173 s_waitcnt lgkmcnt(0) | |
174 v_mov_b32 v1, s0 | |
175 v_mov_b32 v2, s1 | |
176 flat_store_dword v0, v[1:2] | |
177 s_endpgm |