annotate docs/CompileCudaWithLLVM.rst @ 107:a03ddd01be7e

resolve warnings
author Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
date Sun, 31 Jan 2016 17:34:49 +0900
parents 7d135dc70f03
children 1172e4bd9c6f
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
100
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 ===================================
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
2 Compiling CUDA C/C++ with LLVM
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
3 ===================================
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
4
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
5 .. contents::
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
6 :local:
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
7
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
8 Introduction
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
9 ============
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
10
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
11 This document contains the user guides and the internals of compiling CUDA
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
12 C/C++ with LLVM. It is aimed at both users who want to compile CUDA with LLVM
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
13 and developers who want to improve LLVM for GPUs. This document assumes a basic
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
14 familiarity with CUDA. Information about CUDA programming can be found in the
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
15 `CUDA programming guide
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
16 <http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html>`_.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
17
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
18 How to Build LLVM with CUDA Support
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
19 ===================================
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
20
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
21 Below is a quick summary of downloading and building LLVM. Consult the `Getting
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
22 Started <http://llvm.org/docs/GettingStarted.html>`_ page for more details on
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
23 setting up LLVM.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
24
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
25 #. Checkout LLVM
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
26
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
27 .. code-block:: console
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
28
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
29 $ cd where-you-want-llvm-to-live
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
30 $ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
31
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
32 #. Checkout Clang
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
33
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
34 .. code-block:: console
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
35
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
36 $ cd where-you-want-llvm-to-live
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
37 $ cd llvm/tools
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
38 $ svn co http://llvm.org/svn/llvm-project/cfe/trunk clang
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
39
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
40 #. Configure and build LLVM and Clang
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
41
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
42 .. code-block:: console
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
43
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
44 $ cd where-you-want-llvm-to-live
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
45 $ mkdir build
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
46 $ cd build
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
47 $ cmake [options] ..
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
48 $ make
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
49
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
50 How to Compile CUDA C/C++ with LLVM
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
51 ===================================
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
52
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
53 We assume you have installed the CUDA driver and runtime. Consult the `NVIDIA
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
54 CUDA installation Guide
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
55 <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ if
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
56 you have not.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
57
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
58 Suppose you want to compile and run the following CUDA program (``axpy.cu``)
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
59 which multiplies a ``float`` array by a ``float`` scalar (AXPY).
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
60
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
61 .. code-block:: c++
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
62
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
63 #include <helper_cuda.h> // for checkCudaErrors
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
64
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
65 #include <iostream>
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
66
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
67 __global__ void axpy(float a, float* x, float* y) {
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
68 y[threadIdx.x] = a * x[threadIdx.x];
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
69 }
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
70
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
71 int main(int argc, char* argv[]) {
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
72 const int kDataLen = 4;
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
73
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
74 float a = 2.0f;
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
75 float host_x[kDataLen] = {1.0f, 2.0f, 3.0f, 4.0f};
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
76 float host_y[kDataLen];
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
77
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
78 // Copy input data to device.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
79 float* device_x;
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
80 float* device_y;
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
81 checkCudaErrors(cudaMalloc(&device_x, kDataLen * sizeof(float)));
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
82 checkCudaErrors(cudaMalloc(&device_y, kDataLen * sizeof(float)));
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
83 checkCudaErrors(cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
84 cudaMemcpyHostToDevice));
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
85
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
86 // Launch the kernel.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
87 axpy<<<1, kDataLen>>>(a, device_x, device_y);
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
88
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
89 // Copy output data to host.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
90 checkCudaErrors(cudaDeviceSynchronize());
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
91 checkCudaErrors(cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
92 cudaMemcpyDeviceToHost));
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
93
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
94 // Print the results.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
95 for (int i = 0; i < kDataLen; ++i) {
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
96 std::cout << "y[" << i << "] = " << host_y[i] << "\n";
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
97 }
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
98
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
99 checkCudaErrors(cudaDeviceReset());
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
100 return 0;
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
101 }
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
102
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
103 The command line for compilation is similar to what you would use for C++.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
104
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
105 .. code-block:: console
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
106
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
107 $ clang++ -o axpy -I<CUDA install path>/samples/common/inc -L<CUDA install path>/<lib64 or lib> axpy.cu -lcudart_static -lcuda -ldl -lrt -pthread
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
108 $ ./axpy
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
109 y[0] = 2
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
110 y[1] = 4
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
111 y[2] = 6
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
112 y[3] = 8
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
113
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
114 Note that ``helper_cuda.h`` comes from the CUDA samples, so you need the
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
115 samples installed for this example. ``<CUDA install path>`` is the root
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
116 directory where you installed CUDA SDK, typically ``/usr/local/cuda``.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
117
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
118 Optimizations
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
119 =============
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
120
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
121 CPU and GPU have different design philosophies and architectures. For example, a
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
122 typical CPU has branch prediction, out-of-order execution, and is superscalar,
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
123 whereas a typical GPU has none of these. Due to such differences, an
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
124 optimization pipeline well-tuned for CPUs may be not suitable for GPUs.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
125
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
126 LLVM performs several general and CUDA-specific optimizations for GPUs. The
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
127 list below shows some of the more important optimizations for GPUs. Most of
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
128 them have been upstreamed to ``lib/Transforms/Scalar`` and
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
129 ``lib/Target/NVPTX``. A few of them have not been upstreamed due to lack of a
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
130 customizable target-independent optimization pipeline.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
131
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
132 * **Straight-line scalar optimizations**. These optimizations reduce redundancy
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
133 in straight-line code. Details can be found in the `design document for
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
134 straight-line scalar optimizations <https://goo.gl/4Rb9As>`_.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
135
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
136 * **Inferring memory spaces**. `This optimization
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
137 <http://www.llvm.org/docs/doxygen/html/NVPTXFavorNonGenericAddrSpaces_8cpp_source.html>`_
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
138 infers the memory space of an address so that the backend can emit faster
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
139 special loads and stores from it. Details can be found in the `design
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
140 document for memory space inference <https://goo.gl/5wH2Ct>`_.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
141
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
142 * **Aggressive loop unrooling and function inlining**. Loop unrolling and
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
143 function inlining need to be more aggressive for GPUs than for CPUs because
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
144 control flow transfer in GPU is more expensive. They also promote other
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
145 optimizations such as constant propagation and SROA which sometimes speed up
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
146 code by over 10x. An empirical inline threshold for GPUs is 1100. This
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
147 configuration has yet to be upstreamed with a target-specific optimization
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
148 pipeline. LLVM also provides `loop unrolling pragmas
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
149 <http://clang.llvm.org/docs/AttributeReference.html#pragma-unroll-pragma-nounroll>`_
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
150 and ``__attribute__((always_inline))`` for programmers to force unrolling and
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
151 inling.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
152
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
153 * **Aggressive speculative execution**. `This transformation
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
154 <http://llvm.org/docs/doxygen/html/SpeculativeExecution_8cpp_source.html>`_ is
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
155 mainly for promoting straight-line scalar optimizations which are most
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
156 effective on code along dominator paths.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
157
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
158 * **Memory-space alias analysis**. `This alias analysis
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
159 <http://reviews.llvm.org/D12414>`_ infers that two pointers in different
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
160 special memory spaces do not alias. It has yet to be integrated to the new
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
161 alias analysis infrastructure; the new infrastructure does not run
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
162 target-specific alias analysis.
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
163
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
164 * **Bypassing 64-bit divides**. `An existing optimization
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
165 <http://llvm.org/docs/doxygen/html/BypassSlowDivision_8cpp_source.html>`_
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
166 enabled in the NVPTX backend. 64-bit integer divides are much slower than
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
167 32-bit ones on NVIDIA GPUs due to lack of a divide unit. Many of the 64-bit
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
168 divides in our benchmarks have a divisor and dividend which fit in 32-bits at
7d135dc70f03 LLVM 3.9
Miyagi Mitsuki <e135756@ie.u-ryukyu.ac.jp>
parents:
diff changeset
169 runtime. This optimization provides a fast path for this common case.