annotate docs/NVPTXUsage.rst @ 107:a03ddd01be7e

resolve warnings
author Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
date Sun, 31 Jan 2016 17:34:49 +0900
parents afa8332a0e37
children 1172e4bd9c6f
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 =============================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
2 User Guide for NVPTX Back-end
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
3 =============================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
5 .. contents::
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
6 :local:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
7 :depth: 3
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
8
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
9
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
10 Introduction
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
11 ============
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
12
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
13 To support GPU programming, the NVPTX back-end supports a subset of LLVM IR
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
14 along with a defined set of conventions used to represent GPU programming
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
15 concepts. This document provides an overview of the general usage of the back-
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
16 end, including a description of the conventions used and the set of accepted
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
17 LLVM IR.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
18
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
19 .. note::
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
20
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
21 This document assumes a basic familiarity with CUDA and the PTX
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
22 assembly language. Information about the CUDA Driver API and the PTX assembly
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
23 language can be found in the `CUDA documentation
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
24 <http://docs.nvidia.com/cuda/index.html>`_.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
25
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
26
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
27
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
28 Conventions
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
29 ===========
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
30
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
31 Marking Functions as Kernels
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
32 ----------------------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
33
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
34 In PTX, there are two types of functions: *device functions*, which are only
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
35 callable by device code, and *kernel functions*, which are callable by host
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
36 code. By default, the back-end will emit device functions. Metadata is used to
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
37 declare a function as a kernel function. This metadata is attached to the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
38 ``nvvm.annotations`` named metadata object, and has the following format:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
39
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
40 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
41
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
42 !0 = metadata !{<function-ref>, metadata !"kernel", i32 1}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
43
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
44 The first parameter is a reference to the kernel function. The following
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
45 example shows a kernel function calling a device function in LLVM IR. The
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
46 function ``@my_kernel`` is callable from host code, but ``@my_fmad`` is not.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
47
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
48 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
49
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
50 define float @my_fmad(float %x, float %y, float %z) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
51 %mul = fmul float %x, %y
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
52 %add = fadd float %mul, %z
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
53 ret float %add
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
54 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
55
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
56 define void @my_kernel(float* %ptr) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
57 %val = load float* %ptr
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
58 %ret = call float @my_fmad(float %val, float %val, float %val)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
59 store float %ret, float* %ptr
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
60 ret void
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
61 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
62
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
63 !nvvm.annotations = !{!1}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
64 !1 = metadata !{void (float*)* @my_kernel, metadata !"kernel", i32 1}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
65
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
66 When compiled, the PTX kernel functions are callable by host-side code.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
67
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
68
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
69 .. _address_spaces:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
70
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
71 Address Spaces
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
72 --------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
73
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
74 The NVPTX back-end uses the following address space mapping:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
75
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
76 ============= ======================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
77 Address Space Memory Space
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
78 ============= ======================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
79 0 Generic
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
80 1 Global
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
81 2 Internal Use
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
82 3 Shared
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
83 4 Constant
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
84 5 Local
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
85 ============= ======================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
86
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
87 Every global variable and pointer type is assigned to one of these address
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
88 spaces, with 0 being the default address space. Intrinsics are provided which
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
89 can be used to convert pointers between the generic and non-generic address
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
90 spaces.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
91
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
92 As an example, the following IR will define an array ``@g`` that resides in
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
93 global device memory.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
94
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
95 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
96
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
97 @g = internal addrspace(1) global [4 x i32] [ i32 0, i32 1, i32 2, i32 3 ]
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
98
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
99 LLVM IR functions can read and write to this array, and host-side code can
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
100 copy data to it by name with the CUDA Driver API.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
101
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
102 Note that since address space 0 is the generic space, it is illegal to have
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
103 global variables in address space 0. Address space 0 is the default address
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
104 space in LLVM, so the ``addrspace(N)`` annotation is *required* for global
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
105 variables.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
106
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
107
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
108 Triples
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
109 -------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
110
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
111 The NVPTX target uses the module triple to select between 32/64-bit code
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
112 generation and the driver-compiler interface to use. The triple architecture
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
113 can be one of ``nvptx`` (32-bit PTX) or ``nvptx64`` (64-bit PTX). The
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
114 operating system should be one of ``cuda`` or ``nvcl``, which determines the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
115 interface used by the generated code to communicate with the driver. Most
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
116 users will want to use ``cuda`` as the operating system, which makes the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
117 generated PTX compatible with the CUDA Driver API.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
118
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
119 Example: 32-bit PTX for CUDA Driver API: ``nvptx-nvidia-cuda``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
120
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
121 Example: 64-bit PTX for CUDA Driver API: ``nvptx64-nvidia-cuda``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
122
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
123
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
124
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
125 .. _nvptx_intrinsics:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
126
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
127 NVPTX Intrinsics
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
128 ================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
129
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
130 Address Space Conversion
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
131 ------------------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
132
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
133 '``llvm.nvvm.ptr.*.to.gen``' Intrinsics
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
134 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
135
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
136 Syntax:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
137 """""""
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
138
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
139 These are overloaded intrinsics. You can use these on any pointer types.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
140
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
141 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
142
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
143 declare i8* @llvm.nvvm.ptr.global.to.gen.p0i8.p1i8(i8 addrspace(1)*)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
144 declare i8* @llvm.nvvm.ptr.shared.to.gen.p0i8.p3i8(i8 addrspace(3)*)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
145 declare i8* @llvm.nvvm.ptr.constant.to.gen.p0i8.p4i8(i8 addrspace(4)*)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
146 declare i8* @llvm.nvvm.ptr.local.to.gen.p0i8.p5i8(i8 addrspace(5)*)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
147
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
148 Overview:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
149 """""""""
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
150
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
151 The '``llvm.nvvm.ptr.*.to.gen``' intrinsics convert a pointer in a non-generic
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
152 address space to a generic address space pointer.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
153
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
154 Semantics:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
155 """"""""""
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
156
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
157 These intrinsics modify the pointer value to be a valid generic address space
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
158 pointer.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
159
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
160
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
161 '``llvm.nvvm.ptr.gen.to.*``' Intrinsics
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
162 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
163
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
164 Syntax:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
165 """""""
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
166
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
167 These are overloaded intrinsics. You can use these on any pointer types.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
168
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
169 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
170
95
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 77
diff changeset
171 declare i8 addrspace(1)* @llvm.nvvm.ptr.gen.to.global.p1i8.p0i8(i8*)
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 77
diff changeset
172 declare i8 addrspace(3)* @llvm.nvvm.ptr.gen.to.shared.p3i8.p0i8(i8*)
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 77
diff changeset
173 declare i8 addrspace(4)* @llvm.nvvm.ptr.gen.to.constant.p4i8.p0i8(i8*)
afa8332a0e37 LLVM 3.8
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 77
diff changeset
174 declare i8 addrspace(5)* @llvm.nvvm.ptr.gen.to.local.p5i8.p0i8(i8*)
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
175
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
176 Overview:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
177 """""""""
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
178
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
179 The '``llvm.nvvm.ptr.gen.to.*``' intrinsics convert a pointer in the generic
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
180 address space to a pointer in the target address space. Note that these
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
181 intrinsics are only useful if the address space of the target address space of
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
182 the pointer is known. It is not legal to use address space conversion
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
183 intrinsics to convert a pointer from one non-generic address space to another
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
184 non-generic address space.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
185
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
186 Semantics:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
187 """"""""""
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
188
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
189 These intrinsics modify the pointer value to be a valid pointer in the target
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
190 non-generic address space.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
191
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
192
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
193 Reading PTX Special Registers
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
194 -----------------------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
195
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
196 '``llvm.nvvm.read.ptx.sreg.*``'
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
197 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
198
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
199 Syntax:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
200 """""""
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
201
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
202 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
203
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
204 declare i32 @llvm.nvvm.read.ptx.sreg.tid.x()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
205 declare i32 @llvm.nvvm.read.ptx.sreg.tid.y()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
206 declare i32 @llvm.nvvm.read.ptx.sreg.tid.z()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
207 declare i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
208 declare i32 @llvm.nvvm.read.ptx.sreg.ntid.y()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
209 declare i32 @llvm.nvvm.read.ptx.sreg.ntid.z()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
210 declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
211 declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.y()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
212 declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.z()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
213 declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.x()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
214 declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.y()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
215 declare i32 @llvm.nvvm.read.ptx.sreg.nctaid.z()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
216 declare i32 @llvm.nvvm.read.ptx.sreg.warpsize()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
217
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
218 Overview:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
219 """""""""
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
220
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
221 The '``@llvm.nvvm.read.ptx.sreg.*``' intrinsics provide access to the PTX
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
222 special registers, in particular the kernel launch bounds. These registers
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
223 map in the following way to CUDA builtins:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
224
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
225 ============ =====================================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
226 CUDA Builtin PTX Special Register Intrinsic
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
227 ============ =====================================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
228 ``threadId`` ``@llvm.nvvm.read.ptx.sreg.tid.*``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
229 ``blockIdx`` ``@llvm.nvvm.read.ptx.sreg.ctaid.*``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
230 ``blockDim`` ``@llvm.nvvm.read.ptx.sreg.ntid.*``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
231 ``gridDim`` ``@llvm.nvvm.read.ptx.sreg.nctaid.*``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
232 ============ =====================================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
233
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
234
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
235 Barriers
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
236 --------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
237
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
238 '``llvm.nvvm.barrier0``'
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
239 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
240
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
241 Syntax:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
242 """""""
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
243
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
244 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
245
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
246 declare void @llvm.nvvm.barrier0()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
247
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
248 Overview:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
249 """""""""
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
250
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
251 The '``@llvm.nvvm.barrier0()``' intrinsic emits a PTX ``bar.sync 0``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
252 instruction, equivalent to the ``__syncthreads()`` call in CUDA.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
253
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
254
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
255 Other Intrinsics
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
256 ----------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
257
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
258 For the full set of NVPTX intrinsics, please see the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
259 ``include/llvm/IR/IntrinsicsNVVM.td`` file in the LLVM source tree.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
260
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
261
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
262 .. _libdevice:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
263
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
264 Linking with Libdevice
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
265 ======================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
266
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
267 The CUDA Toolkit comes with an LLVM bitcode library called ``libdevice`` that
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
268 implements many common mathematical functions. This library can be used as a
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
269 high-performance math library for any compilers using the LLVM NVPTX target.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
270 The library can be found under ``nvvm/libdevice/`` in the CUDA Toolkit and
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
271 there is a separate version for each compute architecture.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
272
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
273 For a list of all math functions implemented in libdevice, see
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
274 `libdevice Users Guide <http://docs.nvidia.com/cuda/libdevice-users-guide/index.html>`_.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
275
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
276 To accommodate various math-related compiler flags that can affect code
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
277 generation of libdevice code, the library code depends on a special LLVM IR
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
278 pass (``NVVMReflect``) to handle conditional compilation within LLVM IR. This
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
279 pass looks for calls to the ``@__nvvm_reflect`` function and replaces them
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
280 with constants based on the defined reflection parameters. Such conditional
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
281 code often follows a pattern:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
282
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
283 .. code-block:: c++
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
284
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
285 float my_function(float a) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
286 if (__nvvm_reflect("FASTMATH"))
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
287 return my_function_fast(a);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
288 else
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
289 return my_function_precise(a);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
290 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
291
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
292 The default value for all unspecified reflection parameters is zero.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
293
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
294 The ``NVVMReflect`` pass should be executed early in the optimization
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
295 pipeline, immediately after the link stage. The ``internalize`` pass is also
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
296 recommended to remove unused math functions from the resulting PTX. For an
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
297 input IR module ``module.bc``, the following compilation flow is recommended:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
298
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
299 1. Save list of external functions in ``module.bc``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
300 2. Link ``module.bc`` with ``libdevice.compute_XX.YY.bc``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
301 3. Internalize all functions not in list from (1)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
302 4. Eliminate all unused internal functions
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
303 5. Run ``NVVMReflect`` pass
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
304 6. Run standard optimization pipeline
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
305
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
306 .. note::
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
307
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
308 ``linkonce`` and ``linkonce_odr`` linkage types are not suitable for the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
309 libdevice functions. It is possible to link two IR modules that have been
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
310 linked against libdevice using different reflection variables.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
311
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
312 Since the ``NVVMReflect`` pass replaces conditionals with constants, it will
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
313 often leave behind dead code of the form:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
314
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
315 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
316
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
317 entry:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
318 ..
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
319 br i1 true, label %foo, label %bar
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
320 foo:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
321 ..
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
322 bar:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
323 ; Dead code
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
324 ..
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
325
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
326 Therefore, it is recommended that ``NVVMReflect`` is executed early in the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
327 optimization pipeline before dead-code elimination.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
328
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
329
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
330 Reflection Parameters
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
331 ---------------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
332
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
333 The libdevice library currently uses the following reflection parameters to
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
334 control code generation:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
335
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
336 ==================== ======================================================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
337 Flag Description
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
338 ==================== ======================================================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
339 ``__CUDA_FTZ=[0,1]`` Use optimized code paths that flush subnormals to zero
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
340 ==================== ======================================================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
341
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
342
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
343 Invoking NVVMReflect
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
344 --------------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
345
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
346 To ensure that all dead code caused by the reflection pass is eliminated, it
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
347 is recommended that the reflection pass is executed early in the LLVM IR
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
348 optimization pipeline. The pass takes an optional mapping of reflection
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
349 parameter name to an integer value. This mapping can be specified as either a
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
350 command-line option to ``opt`` or as an LLVM ``StringMap<int>`` object when
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
351 programmatically creating a pass pipeline.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
352
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
353 With ``opt``:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
354
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
355 .. code-block:: text
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
356
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
357 # opt -nvvm-reflect -nvvm-reflect-list=<var>=<value>,<var>=<value> module.bc -o module.reflect.bc
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
358
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
359
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
360 With programmatic pass pipeline:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
361
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
362 .. code-block:: c++
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
363
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
364 extern ModulePass *llvm::createNVVMReflectPass(const StringMap<int>& Mapping);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
365
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
366 StringMap<int> ReflectParams;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
367 ReflectParams["__CUDA_FTZ"] = 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
368 Passes.add(createNVVMReflectPass(ReflectParams));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
369
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
370
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
371
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
372 Executing PTX
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
373 =============
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
374
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
375 The most common way to execute PTX assembly on a GPU device is to use the CUDA
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
376 Driver API. This API is a low-level interface to the GPU driver and allows for
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
377 JIT compilation of PTX code to native GPU machine code.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
378
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
379 Initializing the Driver API:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
380
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
381 .. code-block:: c++
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
382
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
383 CUdevice device;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
384 CUcontext context;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
385
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
386 // Initialize the driver API
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
387 cuInit(0);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
388 // Get a handle to the first compute device
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
389 cuDeviceGet(&device, 0);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
390 // Create a compute device context
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
391 cuCtxCreate(&context, 0, device);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
392
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
393 JIT compiling a PTX string to a device binary:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
394
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
395 .. code-block:: c++
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
396
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
397 CUmodule module;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
398 CUfunction funcion;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
399
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
400 // JIT compile a null-terminated PTX string
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
401 cuModuleLoadData(&module, (void*)PTXString);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
402
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
403 // Get a handle to the "myfunction" kernel function
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
404 cuModuleGetFunction(&function, module, "myfunction");
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
405
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
406 For full examples of executing PTX assembly, please see the `CUDA Samples
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
407 <https://developer.nvidia.com/cuda-downloads>`_ distribution.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
408
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
409
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
410 Common Issues
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
411 =============
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
412
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
413 ptxas complains of undefined function: __nvvm_reflect
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
414 -----------------------------------------------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
415
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
416 When linking with libdevice, the ``NVVMReflect`` pass must be used. See
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
417 :ref:`libdevice` for more information.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
418
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
419
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
420 Tutorial: A Simple Compute Kernel
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
421 =================================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
422
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
423 To start, let us take a look at a simple compute kernel written directly in
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
424 LLVM IR. The kernel implements vector addition, where each thread computes one
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
425 element of the output vector C from the input vectors A and B. To make this
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
426 easier, we also assume that only a single CTA (thread block) will be launched,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
427 and that it will be one dimensional.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
428
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
429
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
430 The Kernel
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
431 ----------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
432
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
433 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
434
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
435 target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
436 target triple = "nvptx64-nvidia-cuda"
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
437
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
438 ; Intrinsic to read X component of thread ID
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
439 declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
440
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
441 define void @kernel(float addrspace(1)* %A,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
442 float addrspace(1)* %B,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
443 float addrspace(1)* %C) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
444 entry:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
445 ; What is my ID?
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
446 %id = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
447
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
448 ; Compute pointers into A, B, and C
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
449 %ptrA = getelementptr float addrspace(1)* %A, i32 %id
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
450 %ptrB = getelementptr float addrspace(1)* %B, i32 %id
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
451 %ptrC = getelementptr float addrspace(1)* %C, i32 %id
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
452
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
453 ; Read A, B
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
454 %valA = load float addrspace(1)* %ptrA, align 4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
455 %valB = load float addrspace(1)* %ptrB, align 4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
456
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
457 ; Compute C = A + B
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
458 %valC = fadd float %valA, %valB
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
459
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
460 ; Store back to C
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
461 store float %valC, float addrspace(1)* %ptrC, align 4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
462
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
463 ret void
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
464 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
465
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
466 !nvvm.annotations = !{!0}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
467 !0 = metadata !{void (float addrspace(1)*,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
468 float addrspace(1)*,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
469 float addrspace(1)*)* @kernel, metadata !"kernel", i32 1}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
470
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
471
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
472 We can use the LLVM ``llc`` tool to directly run the NVPTX code generator:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
473
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
474 .. code-block:: text
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
475
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
476 # llc -mcpu=sm_20 kernel.ll -o kernel.ptx
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
477
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
478
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
479 .. note::
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
480
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
481 If you want to generate 32-bit code, change ``p:64:64:64`` to ``p:32:32:32``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
482 in the module data layout string and use ``nvptx-nvidia-cuda`` as the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
483 target triple.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
484
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
485
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
486 The output we get from ``llc`` (as of LLVM 3.4):
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
487
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
488 .. code-block:: text
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
489
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
490 //
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
491 // Generated by LLVM NVPTX Back-End
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
492 //
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
493
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
494 .version 3.1
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
495 .target sm_20
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
496 .address_size 64
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
497
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
498 // .globl kernel
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
499 // @kernel
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
500 .visible .entry kernel(
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
501 .param .u64 kernel_param_0,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
502 .param .u64 kernel_param_1,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
503 .param .u64 kernel_param_2
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
504 )
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
505 {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
506 .reg .f32 %f<4>;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
507 .reg .s32 %r<2>;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
508 .reg .s64 %rl<8>;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
509
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
510 // BB#0: // %entry
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
511 ld.param.u64 %rl1, [kernel_param_0];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
512 mov.u32 %r1, %tid.x;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
513 mul.wide.s32 %rl2, %r1, 4;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
514 add.s64 %rl3, %rl1, %rl2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
515 ld.param.u64 %rl4, [kernel_param_1];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
516 add.s64 %rl5, %rl4, %rl2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
517 ld.param.u64 %rl6, [kernel_param_2];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
518 add.s64 %rl7, %rl6, %rl2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
519 ld.global.f32 %f1, [%rl3];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
520 ld.global.f32 %f2, [%rl5];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
521 add.f32 %f3, %f1, %f2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
522 st.global.f32 [%rl7], %f3;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
523 ret;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
524 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
525
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
526
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
527 Dissecting the Kernel
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
528 ---------------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
529
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
530 Now let us dissect the LLVM IR that makes up this kernel.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
531
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
532 Data Layout
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
533 ^^^^^^^^^^^
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
534
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
535 The data layout string determines the size in bits of common data types, their
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
536 ABI alignment, and their storage size. For NVPTX, you should use one of the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
537 following:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
538
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
539 32-bit PTX:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
540
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
541 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
542
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
543 target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
544
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
545 64-bit PTX:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
546
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
547 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
548
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
549 target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
550
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
551
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
552 Target Intrinsics
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
553 ^^^^^^^^^^^^^^^^^
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
554
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
555 In this example, we use the ``@llvm.nvvm.read.ptx.sreg.tid.x`` intrinsic to
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
556 read the X component of the current thread's ID, which corresponds to a read
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
557 of register ``%tid.x`` in PTX. The NVPTX back-end supports a large set of
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
558 intrinsics. A short list is shown below; please see
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
559 ``include/llvm/IR/IntrinsicsNVVM.td`` for the full list.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
560
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
561
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
562 ================================================ ====================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
563 Intrinsic CUDA Equivalent
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
564 ================================================ ====================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
565 ``i32 @llvm.nvvm.read.ptx.sreg.tid.{x,y,z}`` threadIdx.{x,y,z}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
566 ``i32 @llvm.nvvm.read.ptx.sreg.ctaid.{x,y,z}`` blockIdx.{x,y,z}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
567 ``i32 @llvm.nvvm.read.ptx.sreg.ntid.{x,y,z}`` blockDim.{x,y,z}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
568 ``i32 @llvm.nvvm.read.ptx.sreg.nctaid.{x,y,z}`` gridDim.{x,y,z}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
569 ``void @llvm.cuda.syncthreads()`` __syncthreads()
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
570 ================================================ ====================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
571
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
572
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
573 Address Spaces
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
574 ^^^^^^^^^^^^^^
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
575
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
576 You may have noticed that all of the pointer types in the LLVM IR example had
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
577 an explicit address space specifier. What is address space 1? NVIDIA GPU
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
578 devices (generally) have four types of memory:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
579
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
580 - Global: Large, off-chip memory
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
581 - Shared: Small, on-chip memory shared among all threads in a CTA
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
582 - Local: Per-thread, private memory
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
583 - Constant: Read-only memory shared across all threads
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
584
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
585 These different types of memory are represented in LLVM IR as address spaces.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
586 There is also a fifth address space used by the NVPTX code generator that
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
587 corresponds to the "generic" address space. This address space can represent
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
588 addresses in any other address space (with a few exceptions). This allows
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
589 users to write IR functions that can load/store memory using the same
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
590 instructions. Intrinsics are provided to convert pointers between the generic
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
591 and non-generic address spaces.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
592
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
593 See :ref:`address_spaces` and :ref:`nvptx_intrinsics` for more information.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
594
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
595
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
596 Kernel Metadata
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
597 ^^^^^^^^^^^^^^^
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
598
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
599 In PTX, a function can be either a `kernel` function (callable from the host
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
600 program), or a `device` function (callable only from GPU code). You can think
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
601 of `kernel` functions as entry-points in the GPU program. To mark an LLVM IR
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
602 function as a `kernel` function, we make use of special LLVM metadata. The
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
603 NVPTX back-end will look for a named metadata node called
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
604 ``nvvm.annotations``. This named metadata must contain a list of metadata that
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
605 describe the IR. For our purposes, we need to declare a metadata node that
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
606 assigns the "kernel" attribute to the LLVM IR function that should be emitted
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
607 as a PTX `kernel` function. These metadata nodes take the form:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
608
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
609 .. code-block:: text
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
610
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
611 metadata !{<function ref>, metadata !"kernel", i32 1}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
612
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
613 For the previous example, we have:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
614
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
615 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
616
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
617 !nvvm.annotations = !{!0}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
618 !0 = metadata !{void (float addrspace(1)*,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
619 float addrspace(1)*,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
620 float addrspace(1)*)* @kernel, metadata !"kernel", i32 1}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
621
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
622 Here, we have a single metadata declaration in ``nvvm.annotations``. This
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
623 metadata annotates our ``@kernel`` function with the ``kernel`` attribute.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
624
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
625
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
626 Running the Kernel
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
627 ------------------
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
628
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
629 Generating PTX from LLVM IR is all well and good, but how do we execute it on
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
630 a real GPU device? The CUDA Driver API provides a convenient mechanism for
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
631 loading and JIT compiling PTX to a native GPU device, and launching a kernel.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
632 The API is similar to OpenCL. A simple example showing how to load and
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
633 execute our vector addition code is shown below. Note that for brevity this
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
634 code does not perform much error checking!
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
635
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
636 .. note::
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
637
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
638 You can also use the ``ptxas`` tool provided by the CUDA Toolkit to offline
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
639 compile PTX to machine code (SASS) for a specific GPU architecture. Such
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
640 binaries can be loaded by the CUDA Driver API in the same way as PTX. This
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
641 can be useful for reducing startup time by precompiling the PTX kernels.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
642
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
643
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
644 .. code-block:: c++
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
645
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
646 #include <iostream>
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
647 #include <fstream>
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
648 #include <cassert>
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
649 #include "cuda.h"
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
650
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
651
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
652 void checkCudaErrors(CUresult err) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
653 assert(err == CUDA_SUCCESS);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
654 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
655
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
656 /// main - Program entry point
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
657 int main(int argc, char **argv) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
658 CUdevice device;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
659 CUmodule cudaModule;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
660 CUcontext context;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
661 CUfunction function;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
662 CUlinkState linker;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
663 int devCount;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
664
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
665 // CUDA initialization
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
666 checkCudaErrors(cuInit(0));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
667 checkCudaErrors(cuDeviceGetCount(&devCount));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
668 checkCudaErrors(cuDeviceGet(&device, 0));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
669
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
670 char name[128];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
671 checkCudaErrors(cuDeviceGetName(name, 128, device));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
672 std::cout << "Using CUDA Device [0]: " << name << "\n";
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
673
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
674 int devMajor, devMinor;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
675 checkCudaErrors(cuDeviceComputeCapability(&devMajor, &devMinor, device));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
676 std::cout << "Device Compute Capability: "
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
677 << devMajor << "." << devMinor << "\n";
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
678 if (devMajor < 2) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
679 std::cerr << "ERROR: Device 0 is not SM 2.0 or greater\n";
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
680 return 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
681 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
682
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
683 std::ifstream t("kernel.ptx");
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
684 if (!t.is_open()) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
685 std::cerr << "kernel.ptx not found\n";
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
686 return 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
687 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
688 std::string str((std::istreambuf_iterator<char>(t)),
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
689 std::istreambuf_iterator<char>());
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
690
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
691 // Create driver context
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
692 checkCudaErrors(cuCtxCreate(&context, 0, device));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
693
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
694 // Create module for object
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
695 checkCudaErrors(cuModuleLoadDataEx(&cudaModule, str.c_str(), 0, 0, 0));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
696
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
697 // Get kernel function
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
698 checkCudaErrors(cuModuleGetFunction(&function, cudaModule, "kernel"));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
699
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
700 // Device data
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
701 CUdeviceptr devBufferA;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
702 CUdeviceptr devBufferB;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
703 CUdeviceptr devBufferC;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
704
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
705 checkCudaErrors(cuMemAlloc(&devBufferA, sizeof(float)*16));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
706 checkCudaErrors(cuMemAlloc(&devBufferB, sizeof(float)*16));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
707 checkCudaErrors(cuMemAlloc(&devBufferC, sizeof(float)*16));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
708
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
709 float* hostA = new float[16];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
710 float* hostB = new float[16];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
711 float* hostC = new float[16];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
712
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
713 // Populate input
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
714 for (unsigned i = 0; i != 16; ++i) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
715 hostA[i] = (float)i;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
716 hostB[i] = (float)(2*i);
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
717 hostC[i] = 0.0f;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
718 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
719
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
720 checkCudaErrors(cuMemcpyHtoD(devBufferA, &hostA[0], sizeof(float)*16));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
721 checkCudaErrors(cuMemcpyHtoD(devBufferB, &hostB[0], sizeof(float)*16));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
722
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
723
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
724 unsigned blockSizeX = 16;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
725 unsigned blockSizeY = 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
726 unsigned blockSizeZ = 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
727 unsigned gridSizeX = 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
728 unsigned gridSizeY = 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
729 unsigned gridSizeZ = 1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
730
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
731 // Kernel parameters
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
732 void *KernelParams[] = { &devBufferA, &devBufferB, &devBufferC };
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
733
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
734 std::cout << "Launching kernel\n";
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
735
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
736 // Kernel launch
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
737 checkCudaErrors(cuLaunchKernel(function, gridSizeX, gridSizeY, gridSizeZ,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
738 blockSizeX, blockSizeY, blockSizeZ,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
739 0, NULL, KernelParams, NULL));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
740
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
741 // Retrieve device data
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
742 checkCudaErrors(cuMemcpyDtoH(&hostC[0], devBufferC, sizeof(float)*16));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
743
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
744
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
745 std::cout << "Results:\n";
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
746 for (unsigned i = 0; i != 16; ++i) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
747 std::cout << hostA[i] << " + " << hostB[i] << " = " << hostC[i] << "\n";
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
748 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
749
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
750
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
751 // Clean up after ourselves
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
752 delete [] hostA;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
753 delete [] hostB;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
754 delete [] hostC;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
755
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
756 // Clean-up
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
757 checkCudaErrors(cuMemFree(devBufferA));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
758 checkCudaErrors(cuMemFree(devBufferB));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
759 checkCudaErrors(cuMemFree(devBufferC));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
760 checkCudaErrors(cuModuleUnload(cudaModule));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
761 checkCudaErrors(cuCtxDestroy(context));
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
762
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
763 return 0;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
764 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
765
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
766
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
767 You will need to link with the CUDA driver and specify the path to cuda.h.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
768
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
769 .. code-block:: text
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
770
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
771 # clang++ sample.cpp -o sample -O2 -g -I/usr/local/cuda-5.5/include -lcuda
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
772
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
773 We don't need to specify a path to ``libcuda.so`` since this is installed in a
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
774 system location by the driver, not the CUDA toolkit.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
775
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
776 If everything goes as planned, you should see the following output when
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
777 running the compiled program:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
778
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
779 .. code-block:: text
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
780
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
781 Using CUDA Device [0]: GeForce GTX 680
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
782 Device Compute Capability: 3.0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
783 Launching kernel
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
784 Results:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
785 0 + 0 = 0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
786 1 + 2 = 3
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
787 2 + 4 = 6
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
788 3 + 6 = 9
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
789 4 + 8 = 12
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
790 5 + 10 = 15
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
791 6 + 12 = 18
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
792 7 + 14 = 21
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
793 8 + 16 = 24
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
794 9 + 18 = 27
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
795 10 + 20 = 30
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
796 11 + 22 = 33
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
797 12 + 24 = 36
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
798 13 + 26 = 39
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
799 14 + 28 = 42
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
800 15 + 30 = 45
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
801
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
802 .. note::
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
803
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
804 You will likely see a different device identifier based on your hardware
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
805
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
806
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
807 Tutorial: Linking with Libdevice
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
808 ================================
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
809
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
810 In this tutorial, we show a simple example of linking LLVM IR with the
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
811 libdevice library. We will use the same kernel as the previous tutorial,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
812 except that we will compute ``C = pow(A, B)`` instead of ``C = A + B``.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
813 Libdevice provides an ``__nv_powf`` function that we will use.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
814
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
815 .. code-block:: llvm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
816
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
817 target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
818 target triple = "nvptx64-nvidia-cuda"
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
819
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
820 ; Intrinsic to read X component of thread ID
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
821 declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
822 ; libdevice function
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
823 declare float @__nv_powf(float, float)
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
824
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
825 define void @kernel(float addrspace(1)* %A,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
826 float addrspace(1)* %B,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
827 float addrspace(1)* %C) {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
828 entry:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
829 ; What is my ID?
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
830 %id = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
831
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
832 ; Compute pointers into A, B, and C
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
833 %ptrA = getelementptr float addrspace(1)* %A, i32 %id
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
834 %ptrB = getelementptr float addrspace(1)* %B, i32 %id
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
835 %ptrC = getelementptr float addrspace(1)* %C, i32 %id
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
836
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
837 ; Read A, B
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
838 %valA = load float addrspace(1)* %ptrA, align 4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
839 %valB = load float addrspace(1)* %ptrB, align 4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
840
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
841 ; Compute C = pow(A, B)
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
842 %valC = call float @__nv_powf(float %valA, float %valB)
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
843
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
844 ; Store back to C
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
845 store float %valC, float addrspace(1)* %ptrC, align 4
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
846
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
847 ret void
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
848 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
849
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
850 !nvvm.annotations = !{!0}
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
851 !0 = metadata !{void (float addrspace(1)*,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
852 float addrspace(1)*,
77
54457678186b LLVM 3.6
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents: 0
diff changeset
853 float addrspace(1)*)* @kernel, metadata !"kernel", i32 1}
0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
854
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
855
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
856 To compile this kernel, we perform the following steps:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
857
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
858 1. Link with libdevice
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
859 2. Internalize all but the public kernel function
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
860 3. Run ``NVVMReflect`` and set ``__CUDA_FTZ`` to 0
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
861 4. Optimize the linked module
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
862 5. Codegen the module
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
863
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
864
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
865 These steps can be performed by the LLVM ``llvm-link``, ``opt``, and ``llc``
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
866 tools. In a complete compiler, these steps can also be performed entirely
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
867 programmatically by setting up an appropriate pass configuration (see
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
868 :ref:`libdevice`).
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
869
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
870 .. code-block:: text
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
871
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
872 # llvm-link t2.bc libdevice.compute_20.10.bc -o t2.linked.bc
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
873 # opt -internalize -internalize-public-api-list=kernel -nvvm-reflect-list=__CUDA_FTZ=0 -nvvm-reflect -O3 t2.linked.bc -o t2.opt.bc
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
874 # llc -mcpu=sm_20 t2.opt.bc -o t2.ptx
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
875
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
876 .. note::
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
877
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
878 The ``-nvvm-reflect-list=_CUDA_FTZ=0`` is not strictly required, as any
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
879 undefined variables will default to zero. It is shown here for evaluation
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
880 purposes.
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
881
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
882
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
883 This gives us the following PTX (excerpt):
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
884
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
885 .. code-block:: text
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
886
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
887 //
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
888 // Generated by LLVM NVPTX Back-End
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
889 //
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
890
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
891 .version 3.1
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
892 .target sm_20
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
893 .address_size 64
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
894
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
895 // .globl kernel
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
896 // @kernel
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
897 .visible .entry kernel(
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
898 .param .u64 kernel_param_0,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
899 .param .u64 kernel_param_1,
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
900 .param .u64 kernel_param_2
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
901 )
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
902 {
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
903 .reg .pred %p<30>;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
904 .reg .f32 %f<111>;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
905 .reg .s32 %r<21>;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
906 .reg .s64 %rl<8>;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
907
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
908 // BB#0: // %entry
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
909 ld.param.u64 %rl2, [kernel_param_0];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
910 mov.u32 %r3, %tid.x;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
911 ld.param.u64 %rl3, [kernel_param_1];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
912 mul.wide.s32 %rl4, %r3, 4;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
913 add.s64 %rl5, %rl2, %rl4;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
914 ld.param.u64 %rl6, [kernel_param_2];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
915 add.s64 %rl7, %rl3, %rl4;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
916 add.s64 %rl1, %rl6, %rl4;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
917 ld.global.f32 %f1, [%rl5];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
918 ld.global.f32 %f2, [%rl7];
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
919 setp.eq.f32 %p1, %f1, 0f3F800000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
920 setp.eq.f32 %p2, %f2, 0f00000000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
921 or.pred %p3, %p1, %p2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
922 @%p3 bra BB0_1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
923 bra.uni BB0_2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
924 BB0_1:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
925 mov.f32 %f110, 0f3F800000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
926 st.global.f32 [%rl1], %f110;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
927 ret;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
928 BB0_2: // %__nv_isnanf.exit.i
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
929 abs.f32 %f4, %f1;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
930 setp.gtu.f32 %p4, %f4, 0f7F800000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
931 @%p4 bra BB0_4;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
932 // BB#3: // %__nv_isnanf.exit5.i
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
933 abs.f32 %f5, %f2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
934 setp.le.f32 %p5, %f5, 0f7F800000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
935 @%p5 bra BB0_5;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
936 BB0_4: // %.critedge1.i
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
937 add.f32 %f110, %f1, %f2;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
938 st.global.f32 [%rl1], %f110;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
939 ret;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
940 BB0_5: // %__nv_isinff.exit.i
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
941
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
942 ...
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
943
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
944 BB0_26: // %__nv_truncf.exit.i.i.i.i.i
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
945 mul.f32 %f90, %f107, 0f3FB8AA3B;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
946 cvt.rzi.f32.f32 %f91, %f90;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
947 mov.f32 %f92, 0fBF317200;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
948 fma.rn.f32 %f93, %f91, %f92, %f107;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
949 mov.f32 %f94, 0fB5BFBE8E;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
950 fma.rn.f32 %f95, %f91, %f94, %f93;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
951 mul.f32 %f89, %f95, 0f3FB8AA3B;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
952 // inline asm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
953 ex2.approx.ftz.f32 %f88,%f89;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
954 // inline asm
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
955 add.f32 %f96, %f91, 0f00000000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
956 ex2.approx.f32 %f97, %f96;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
957 mul.f32 %f98, %f88, %f97;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
958 setp.lt.f32 %p15, %f107, 0fC2D20000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
959 selp.f32 %f99, 0f00000000, %f98, %p15;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
960 setp.gt.f32 %p16, %f107, 0f42D20000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
961 selp.f32 %f110, 0f7F800000, %f99, %p16;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
962 setp.eq.f32 %p17, %f110, 0f7F800000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
963 @%p17 bra BB0_28;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
964 // BB#27:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
965 fma.rn.f32 %f110, %f110, %f108, %f110;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
966 BB0_28: // %__internal_accurate_powf.exit.i
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
967 setp.lt.f32 %p18, %f1, 0f00000000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
968 setp.eq.f32 %p19, %f3, 0f3F800000;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
969 and.pred %p20, %p18, %p19;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
970 @!%p20 bra BB0_30;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
971 bra.uni BB0_29;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
972 BB0_29:
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
973 mov.b32 %r9, %f110;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
974 xor.b32 %r10, %r9, -2147483648;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
975 mov.b32 %f110, %r10;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
976 BB0_30: // %__nv_powf.exit
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
977 st.global.f32 [%rl1], %f110;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
978 ret;
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
979 }
95c75e76d11b LLVM 3.4
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff changeset
980