95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
1 ==============================
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
2 User Guide for AMDGPU Back-end
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
3 ==============================
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
4
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
5 Introduction
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
6 ============
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
7
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
8 The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
9 the R600 family up until the current Volcanic Islands (GCN Gen 3).
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
10
|
120
|
11 Refer to `AMDGPU section in Architecture & Platform Information for Compiler Writers <CompilerWriterInfo.html#amdgpu>`_
|
|
12 for additional documentation.
|
|
13
|
|
14 Conventions
|
|
15 ===========
|
|
16
|
|
17 Address Spaces
|
|
18 --------------
|
|
19
|
|
20 The AMDGPU back-end uses the following address space mapping:
|
|
21
|
|
22 ============= ============================================
|
|
23 Address Space Memory Space
|
|
24 ============= ============================================
|
|
25 0 Private
|
|
26 1 Global
|
|
27 2 Constant
|
|
28 3 Local
|
|
29 4 Generic (Flat)
|
|
30 5 Region
|
|
31 ============= ============================================
|
|
32
|
|
33 The terminology in the table, aside from the region memory space, is from the
|
|
34 OpenCL standard.
|
|
35
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
36
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
37 Assembler
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
38 =========
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
39
|
120
|
40 AMDGPU backend has LLVM-MC based assembler which is currently in development.
|
|
41 It supports Southern Islands ISA, Sea Islands and Volcanic Islands.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
42
|
120
|
43 This document describes general syntax for instructions and operands. For more
|
|
44 information about instructions, their semantics and supported combinations
|
|
45 of operands, refer to one of Instruction Set Architecture manuals.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
46
|
120
|
47 An instruction has the following syntax (register operands are
|
|
48 normally comma-separated while extra operands are space-separated):
|
|
49
|
|
50 *<opcode> <register_operand0>, ... <extra_operand0> ...*
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
51
|
120
|
52
|
|
53 Operands
|
|
54 --------
|
|
55
|
|
56 The following syntax for register operands is supported:
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
57
|
120
|
58 * SGPR registers: s0, ... or s[0], ...
|
|
59 * VGPR registers: v0, ... or v[0], ...
|
|
60 * TTMP registers: ttmp0, ... or ttmp[0], ...
|
|
61 * Special registers: exec (exec_lo, exec_hi), vcc (vcc_lo, vcc_hi), flat_scratch (flat_scratch_lo, flat_scratch_hi)
|
|
62 * Special trap registers: tba (tba_lo, tba_hi), tma (tma_lo, tma_hi)
|
|
63 * Register pairs, quads, etc: s[2:3], v[10:11], ttmp[5:6], s[4:7], v[12:15], ttmp[4:7], s[8:15], ...
|
|
64 * Register lists: [s0, s1], [ttmp0, ttmp1, ttmp2, ttmp3]
|
|
65 * Register index expressions: v[2*2], s[1-1:2-1]
|
|
66 * 'off' indicates that an operand is not enabled
|
|
67
|
|
68 The following extra operands are supported:
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
69
|
120
|
70 * offset, offset0, offset1
|
|
71 * idxen, offen bits
|
|
72 * glc, slc, tfe bits
|
|
73 * waitcnt: integer or combination of counter values
|
|
74 * VOP3 modifiers:
|
|
75
|
|
76 - abs (\| \|), neg (\-)
|
|
77
|
|
78 * DPP modifiers:
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
79
|
120
|
80 - row_shl, row_shr, row_ror, row_rol
|
|
81 - row_mirror, row_half_mirror, row_bcast
|
|
82 - wave_shl, wave_shr, wave_ror, wave_rol, quad_perm
|
|
83 - row_mask, bank_mask, bound_ctrl
|
|
84
|
|
85 * SDWA modifiers:
|
|
86
|
|
87 - dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD)
|
|
88 - dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE)
|
|
89 - abs, neg, sext
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
90
|
120
|
91 DS Instructions Examples
|
|
92 ------------------------
|
|
93
|
|
94 .. code-block:: nasm
|
|
95
|
|
96 ds_add_u32 v2, v4 offset:16
|
|
97 ds_write_src2_b64 v2 offset0:4 offset1:8
|
|
98 ds_cmpst_f32 v2, v4, v6
|
|
99 ds_min_rtn_f64 v[8:9], v2, v[4:5]
|
|
100
|
|
101
|
|
102 For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
103
|
120
|
104 FLAT Instruction Examples
|
|
105 --------------------------
|
|
106
|
|
107 .. code-block:: nasm
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
108
|
120
|
109 flat_load_dword v1, v[3:4]
|
|
110 flat_store_dwordx3 v[3:4], v[5:7]
|
|
111 flat_atomic_swap v1, v[3:4], v5 glc
|
|
112 flat_atomic_cmpswap v1, v[3:4], v[5:6] glc slc
|
|
113 flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc
|
|
114
|
|
115 For full list of supported instructions, refer to "FLAT instructions" in ISA Manual.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
116
|
120
|
117 MUBUF Instruction Examples
|
|
118 ---------------------------
|
|
119
|
|
120 .. code-block:: nasm
|
|
121
|
|
122 buffer_load_dword v1, off, s[4:7], s1
|
|
123 buffer_store_dwordx4 v[1:4], v2, ttmp[4:7], s1 offen offset:4 glc tfe
|
|
124 buffer_store_format_xy v[1:2], off, s[4:7], s1
|
|
125 buffer_wbinvl1
|
|
126 buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc
|
|
127
|
|
128 For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
129
|
120
|
130 SMRD/SMEM Instruction Examples
|
|
131 -------------------------------
|
|
132
|
|
133 .. code-block:: nasm
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
134
|
120
|
135 s_load_dword s1, s[2:3], 0xfc
|
|
136 s_load_dwordx8 s[8:15], s[2:3], s4
|
|
137 s_load_dwordx16 s[88:103], s[2:3], s4
|
|
138 s_dcache_inv_vol
|
|
139 s_memtime s[4:5]
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
140
|
120
|
141 For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual.
|
|
142
|
|
143 SOP1 Instruction Examples
|
|
144 --------------------------
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
145
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
146 .. code-block:: nasm
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
147
|
120
|
148 s_mov_b32 s1, s2
|
|
149 s_mov_b64 s[0:1], 0x80000000
|
|
150 s_cmov_b32 s1, 200
|
|
151 s_wqm_b64 s[2:3], s[4:5]
|
|
152 s_bcnt0_i32_b64 s1, s[2:3]
|
|
153 s_swappc_b64 s[2:3], s[4:5]
|
|
154 s_cbranch_join s[4:5]
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
155
|
120
|
156 For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
157
|
120
|
158 SOP2 Instruction Examples
|
|
159 -------------------------
|
|
160
|
|
161 .. code-block:: nasm
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
162
|
120
|
163 s_add_u32 s1, s2, s3
|
|
164 s_and_b64 s[2:3], s[4:5], s[6:7]
|
|
165 s_cselect_b32 s1, s2, s3
|
|
166 s_andn2_b32 s2, s4, s6
|
|
167 s_lshr_b64 s[2:3], s[4:5], s6
|
|
168 s_ashr_i32 s2, s4, s6
|
|
169 s_bfm_b64 s[2:3], s4, s6
|
|
170 s_bfe_i64 s[2:3], s[4:5], s6
|
|
171 s_cbranch_g_fork s[4:5], s[6:7]
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
172
|
120
|
173 For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
174
|
120
|
175 SOPC Instruction Examples
|
|
176 --------------------------
|
|
177
|
|
178 .. code-block:: nasm
|
|
179
|
|
180 s_cmp_eq_i32 s1, s2
|
|
181 s_bitcmp1_b32 s1, s2
|
|
182 s_bitcmp0_b64 s[2:3], s4
|
|
183 s_setvskip s3, s5
|
|
184
|
|
185 For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual.
|
|
186
|
|
187 SOPP Instruction Examples
|
|
188 --------------------------
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
189
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
190 .. code-block:: nasm
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
191
|
120
|
192 s_barrier
|
|
193 s_nop 2
|
|
194 s_endpgm
|
|
195 s_waitcnt 0 ; Wait for all counters to be 0
|
|
196 s_waitcnt vmcnt(0) & expcnt(0) & lgkmcnt(0) ; Equivalent to above
|
|
197 s_waitcnt vmcnt(1) ; Wait for vmcnt counter to be 1.
|
|
198 s_sethalt 9
|
|
199 s_sleep 10
|
|
200 s_sendmsg 0x1
|
|
201 s_sendmsg sendmsg(MSG_INTERRUPT)
|
|
202 s_trap 1
|
|
203
|
|
204 For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual.
|
|
205
|
|
206 Unless otherwise mentioned, little verification is performed on the operands
|
|
207 of SOPP Instrucitons, so it is up to the programmer to be familiar with the
|
|
208 range or acceptable values.
|
|
209
|
|
210 Vector ALU Instruction Examples
|
|
211 -------------------------------
|
|
212
|
|
213 For vector ALU instruction opcodes (VOP1, VOP2, VOP3, VOPC, VOP_DPP, VOP_SDWA),
|
|
214 the assembler will automatically use optimal encoding based on its operands.
|
|
215 To force specific encoding, one can add a suffix to the opcode of the instruction:
|
|
216
|
|
217 * _e32 for 32-bit VOP1/VOP2/VOPC
|
|
218 * _e64 for 64-bit VOP3
|
|
219 * _dpp for VOP_DPP
|
|
220 * _sdwa for VOP_SDWA
|
|
221
|
|
222 VOP1/VOP2/VOP3/VOPC examples:
|
|
223
|
|
224 .. code-block:: nasm
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
225
|
120
|
226 v_mov_b32 v1, v2
|
|
227 v_mov_b32_e32 v1, v2
|
|
228 v_nop
|
|
229 v_cvt_f64_i32_e32 v[1:2], v2
|
|
230 v_floor_f32_e32 v1, v2
|
|
231 v_bfrev_b32_e32 v1, v2
|
|
232 v_add_f32_e32 v1, v2, v3
|
|
233 v_mul_i32_i24_e64 v1, v2, 3
|
|
234 v_mul_i32_i24_e32 v1, -3, v3
|
|
235 v_mul_i32_i24_e32 v1, -100, v3
|
|
236 v_addc_u32 v1, s[0:1], v2, v3, s[2:3]
|
|
237 v_max_f16_e32 v1, v2, v3
|
|
238
|
|
239 VOP_DPP examples:
|
|
240
|
|
241 .. code-block:: nasm
|
|
242
|
|
243 v_mov_b32 v0, v0 quad_perm:[0,2,1,1]
|
|
244 v_sin_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
|
|
245 v_mov_b32 v0, v0 wave_shl:1
|
|
246 v_mov_b32 v0, v0 row_mirror
|
|
247 v_mov_b32 v0, v0 row_bcast:31
|
|
248 v_mov_b32 v0, v0 quad_perm:[1,3,0,1] row_mask:0xa bank_mask:0x1 bound_ctrl:0
|
|
249 v_add_f32 v0, v0, |v0| row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
|
|
250 v_max_f16 v1, v2, v3 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0
|
|
251
|
|
252 VOP_SDWA examples:
|
|
253
|
|
254 .. code-block:: nasm
|
|
255
|
|
256 v_mov_b32 v1, v2 dst_sel:BYTE_0 dst_unused:UNUSED_PRESERVE src0_sel:DWORD
|
|
257 v_min_u32 v200, v200, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD
|
|
258 v_sin_f32 v0, v0 dst_unused:UNUSED_PAD src0_sel:WORD_1
|
|
259 v_fract_f32 v0, |v0| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
|
|
260 v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0
|
|
261
|
|
262 For full list of supported instructions, refer to "Vector ALU instructions".
|
|
263
|
|
264 HSA Code Object Directives
|
|
265 --------------------------
|
|
266
|
|
267 AMDGPU ABI defines auxiliary data in output code object. In assembly source,
|
|
268 one can specify them with assembler directives.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
269
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
270 .hsa_code_object_version major, minor
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
271 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
272
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
273 *major* and *minor* are integers that specify the version of the HSA code
|
120
|
274 object that will be generated by the assembler.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
275
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
276 .hsa_code_object_isa [major, minor, stepping, vendor, arch]
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
277 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
278
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
279 *major*, *minor*, and *stepping* are all integers that describe the instruction
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
280 set architecture (ISA) version of the assembly program.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
281
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
282 *vendor* and *arch* are quoted strings. *vendor* should always be equal to
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
283 "AMD" and *arch* should always be equal to "AMDGPU".
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
284
|
120
|
285 By default, the assembler will derive the ISA version, *vendor*, and *arch*
|
|
286 from the value of the -mcpu option that is passed to the assembler.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
287
|
120
|
288 .amdgpu_hsa_kernel (name)
|
|
289 ^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
290
|
|
291 This directives specifies that the symbol with given name is a kernel entry point
|
|
292 (label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
293
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
294 .amd_kernel_code_t
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
295 ^^^^^^^^^^^^^^^^^^
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
296
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
297 This directive marks the beginning of a list of key / value pairs that are used
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
298 to specify the amd_kernel_code_t object that will be emitted by the assembler.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
299 The list must be terminated by the *.end_amd_kernel_code_t* directive. For
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
300 any amd_kernel_code_t values that are unspecified a default value will be
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
301 used. The default value for all keys is 0, with the following exceptions:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
302
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
303 - *kernel_code_version_major* defaults to 1.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
304 - *machine_kind* defaults to 1.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
305 - *machine_version_major*, *machine_version_minor*, and
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
306 *machine_version_stepping* are derived from the value of the -mcpu option
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
307 that is passed to the assembler.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
308 - *kernel_code_entry_byte_offset* defaults to 256.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
309 - *wavefront_size* defaults to 6.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
310 - *kernarg_segment_alignment*, *group_segment_alignment*, and
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
311 *private_segment_alignment* default to 4. Note that alignments are specified
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
312 as a power of two, so a value of **n** means an alignment of 2^ **n**.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
313
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
314 The *.amd_kernel_code_t* directive must be placed immediately after the
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
315 function label and before any instructions.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
316
|
120
|
317 For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document,
|
|
318 comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s.
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
319
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
320 Here is an example of a minimal amd_kernel_code_t specification:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
321
|
120
|
322 .. code-block:: none
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
323
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
324 .hsa_code_object_version 1,0
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
325 .hsa_code_object_isa
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
326
|
120
|
327 .hsatext
|
|
328 .globl hello_world
|
|
329 .p2align 8
|
|
330 .amdgpu_hsa_kernel hello_world
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
331
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
332 hello_world:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
333
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
334 .amd_kernel_code_t
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
335 enable_sgpr_kernarg_segment_ptr = 1
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
336 is_ptr64 = 1
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
337 compute_pgm_rsrc1_vgprs = 0
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
338 compute_pgm_rsrc1_sgprs = 0
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
339 compute_pgm_rsrc2_user_sgpr = 2
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
340 kernarg_segment_byte_size = 8
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
341 wavefront_sgpr_count = 2
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
342 workitem_vgpr_count = 3
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
343 .end_amd_kernel_code_t
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
344
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
345 s_load_dwordx2 s[0:1], s[0:1] 0x0
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
346 v_mov_b32 v0, 3.14159
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
347 s_waitcnt lgkmcnt(0)
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
348 v_mov_b32 v1, s0
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
349 v_mov_b32 v2, s1
|
120
|
350 flat_store_dword v[1:2], v0
|
95
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
351 s_endpgm
|
120
|
352 .Lfunc_end0:
|
|
353 .size hello_world, .Lfunc_end0-hello_world
|