100
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
1 ===================================
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
2 Compiling CUDA C/C++ with LLVM
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
3 ===================================
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
4
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
5 .. contents::
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
6 :local:
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
7
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
8 Introduction
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
9 ============
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
10
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
11 This document contains the user guides and the internals of compiling CUDA
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
12 C/C++ with LLVM. It is aimed at both users who want to compile CUDA with LLVM
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
13 and developers who want to improve LLVM for GPUs. This document assumes a basic
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
14 familiarity with CUDA. Information about CUDA programming can be found in the
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
15 `CUDA programming guide
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
16 <http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html>`_.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
17
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
18 How to Build LLVM with CUDA Support
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
19 ===================================
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
20
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
21 Below is a quick summary of downloading and building LLVM. Consult the `Getting
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
22 Started <http://llvm.org/docs/GettingStarted.html>`_ page for more details on
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
23 setting up LLVM.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
24
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
25 #. Checkout LLVM
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
26
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
27 .. code-block:: console
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
28
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
29 $ cd where-you-want-llvm-to-live
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
30 $ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
31
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
32 #. Checkout Clang
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
33
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
34 .. code-block:: console
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
35
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
36 $ cd where-you-want-llvm-to-live
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
37 $ cd llvm/tools
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
38 $ svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
39
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
40 #. Configure and build LLVM and Clang
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
41
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
42 .. code-block:: console
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
43
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
44 $ cd where-you-want-llvm-to-live
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
45 $ mkdir build
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
46 $ cd build
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
47 $ cmake [options] ..
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
48 $ make
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
49
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
50 How to Compile CUDA C/C++ with LLVM
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
51 ===================================
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
52
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
53 We assume you have installed the CUDA driver and runtime. Consult the `NVIDIA
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
54 CUDA installation Guide
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
55 <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ if
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
56 you have not.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
57
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
58 Suppose you want to compile and run the following CUDA program (``axpy.cu``)
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
59 which multiplies a ``float`` array by a ``float`` scalar (AXPY).
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
60
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
61 .. code-block:: c++
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
62
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
63 #include <helper_cuda.h> // for checkCudaErrors
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
64
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
65 #include <iostream>
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
66
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
67 __global__ void axpy(float a, float* x, float* y) {
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
68 y[threadIdx.x] = a * x[threadIdx.x];
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
69 }
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
70
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
71 int main(int argc, char* argv[]) {
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
72 const int kDataLen = 4;
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
73
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
74 float a = 2.0f;
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
75 float host_x[kDataLen] = {1.0f, 2.0f, 3.0f, 4.0f};
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
76 float host_y[kDataLen];
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
77
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
78 // Copy input data to device.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
79 float* device_x;
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
80 float* device_y;
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
81 checkCudaErrors(cudaMalloc(&device_x, kDataLen * sizeof(float)));
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
82 checkCudaErrors(cudaMalloc(&device_y, kDataLen * sizeof(float)));
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
83 checkCudaErrors(cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
84 cudaMemcpyHostToDevice));
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
85
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
86 // Launch the kernel.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
87 axpy<<<1, kDataLen>>>(a, device_x, device_y);
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
88
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
89 // Copy output data to host.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
90 checkCudaErrors(cudaDeviceSynchronize());
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
91 checkCudaErrors(cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
92 cudaMemcpyDeviceToHost));
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
93
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
94 // Print the results.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
95 for (int i = 0; i < kDataLen; ++i) {
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
96 std::cout << "y[" << i << "] = " << host_y[i] << "\n";
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
97 }
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
98
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
99 checkCudaErrors(cudaDeviceReset());
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
100 return 0;
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
101 }
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
102
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
103 The command line for compilation is similar to what you would use for C++.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
104
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
105 .. code-block:: console
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
106
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
107 $ clang++ -o axpy -I<CUDA install path>/samples/common/inc -L<CUDA install path>/<lib64 or lib> axpy.cu -lcudart_static -lcuda -ldl -lrt -pthread
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
108 $ ./axpy
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
109 y[0] = 2
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
110 y[1] = 4
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
111 y[2] = 6
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
112 y[3] = 8
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
113
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
114 Note that ``helper_cuda.h`` comes from the CUDA samples, so you need the
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
115 samples installed for this example. ``<CUDA install path>`` is the root
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
116 directory where you installed CUDA SDK, typically ``/usr/local/cuda``.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
117
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
118 Optimizations
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
119 =============
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
120
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
121 CPU and GPU have different design philosophies and architectures. For example, a
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
122 typical CPU has branch prediction, out-of-order execution, and is superscalar,
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
123 whereas a typical GPU has none of these. Due to such differences, an
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
124 optimization pipeline well-tuned for CPUs may be not suitable for GPUs.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
125
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
126 LLVM performs several general and CUDA-specific optimizations for GPUs. The
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
127 list below shows some of the more important optimizations for GPUs. Most of
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
128 them have been upstreamed to ``lib/Transforms/Scalar`` and
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
129 ``lib/Target/NVPTX``. A few of them have not been upstreamed due to lack of a
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
130 customizable target-independent optimization pipeline.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
131
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
132 * **Straight-line scalar optimizations**. These optimizations reduce redundancy
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
133 in straight-line code. Details can be found in the `design document for
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
134 straight-line scalar optimizations <https://goo.gl/4Rb9As>`_.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
135
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
136 * **Inferring memory spaces**. `This optimization
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
137 <http://www.llvm.org/docs/doxygen/html/NVPTXFavorNonGenericAddrSpaces_8cpp_source.html>`_
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
138 infers the memory space of an address so that the backend can emit faster
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
139 special loads and stores from it. Details can be found in the `design
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
140 document for memory space inference <https://goo.gl/5wH2Ct>`_.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
141
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
142 * **Aggressive loop unrooling and function inlining**. Loop unrolling and
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
143 function inlining need to be more aggressive for GPUs than for CPUs because
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
144 control flow transfer in GPU is more expensive. They also promote other
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
145 optimizations such as constant propagation and SROA which sometimes speed up
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
146 code by over 10x. An empirical inline threshold for GPUs is 1100. This
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
147 configuration has yet to be upstreamed with a target-specific optimization
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
148 pipeline. LLVM also provides `loop unrolling pragmas
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
149 <http://clang.llvm.org/docs/AttributeReference.html#pragma-unroll-pragma-nounroll>`_
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
150 and ``__attribute__((always_inline))`` for programmers to force unrolling and
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
151 inling.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
152
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
153 * **Aggressive speculative execution**. `This transformation
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
154 <http://llvm.org/docs/doxygen/html/SpeculativeExecution_8cpp_source.html>`_ is
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
155 mainly for promoting straight-line scalar optimizations which are most
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
156 effective on code along dominator paths.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
157
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
158 * **Memory-space alias analysis**. `This alias analysis
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
159 <http://reviews.llvm.org/D12414>`_ infers that two pointers in different
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
160 special memory spaces do not alias. It has yet to be integrated to the new
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
161 alias analysis infrastructure; the new infrastructure does not run
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
162 target-specific alias analysis.
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
163
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
164 * **Bypassing 64-bit divides**. `An existing optimization
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
165 <http://llvm.org/docs/doxygen/html/BypassSlowDivision_8cpp_source.html>`_
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
166 enabled in the NVPTX backend. 64-bit integer divides are much slower than
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
167 32-bit ones on NVIDIA GPUs due to lack of a divide unit. Many of the 64-bit
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
168 divides in our benchmarks have a divisor and dividend which fit in 32-bits at
|
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
169 runtime. This optimization provides a fast path for this common case.
|