comparison docs/CompileCudaWithLLVM.rst @ 148:63bd29f05246

merged
author Shinji KONO <kono@ie.u-ryukyu.ac.jp>
date Wed, 14 Aug 2019 19:46:37 +0900
parents c2174574ed3a
children
comparison
equal deleted inserted replaced
146:3fc4d5c3e21e 148:63bd29f05246
20 =================== 20 ===================
21 21
22 Prerequisites 22 Prerequisites
23 ------------- 23 -------------
24 24
25 CUDA is supported in llvm 3.9, but it's still in active development, so we 25 CUDA is supported since llvm 3.9. Current release of clang (7.0.0) supports CUDA
26 recommend you `compile clang/LLVM from HEAD 26 7.0 through 9.2. If you need support for CUDA 10, you will need to use clang
27 <http://llvm.org/docs/GettingStarted.html>`_. 27 built from r342924 or newer.
28 28
29 Before you build CUDA code, you'll need to have installed the appropriate 29 Before you build CUDA code, you'll need to have installed the appropriate driver
30 driver for your nvidia GPU and the CUDA SDK. See `NVIDIA's CUDA installation 30 for your nvidia GPU and the CUDA SDK. See `NVIDIA's CUDA installation guide
31 guide <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ 31 <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ for
32 for details. Note that clang `does not support 32 details. Note that clang `does not support
33 <https://llvm.org/bugs/show_bug.cgi?id=26966>`_ the CUDA toolkit as installed 33 <https://llvm.org/bugs/show_bug.cgi?id=26966>`_ the CUDA toolkit as installed by
34 by many Linux package managers; you probably need to install nvidia's package. 34 many Linux package managers; you probably need to install CUDA in a single
35 35 directory from NVIDIA's package.
36 You will need CUDA 7.0, 7.5, or 8.0 to compile with clang. 36
37 37 CUDA compilation is supported on Linux. Compilation on MacOS and Windows may or
38 CUDA compilation is supported on Linux, on MacOS as of 2016-11-18, and on 38 may not work and currently have no maintainers. Compilation with CUDA-9.x is
39 Windows as of 2017-01-05. 39 `currently broken on Windows <https://bugs.llvm.org/show_bug.cgi?id=38811>`_.
40 40
41 Invoking clang 41 Invoking clang
42 -------------- 42 --------------
43 43
44 Invoking clang for CUDA compilation works similarly to compiling regular C++. 44 Invoking clang for CUDA compilation works similarly to compiling regular C++.
71 Typically, ``/usr/local/cuda``. 71 Typically, ``/usr/local/cuda``.
72 72
73 Pass e.g. ``-L/usr/local/cuda/lib64`` if compiling in 64-bit mode; otherwise, 73 Pass e.g. ``-L/usr/local/cuda/lib64`` if compiling in 64-bit mode; otherwise,
74 pass e.g. ``-L/usr/local/cuda/lib``. (In CUDA, the device code and host code 74 pass e.g. ``-L/usr/local/cuda/lib``. (In CUDA, the device code and host code
75 always have the same pointer widths, so if you're compiling 64-bit code for 75 always have the same pointer widths, so if you're compiling 64-bit code for
76 the host, you're also compiling 64-bit code for the device.) 76 the host, you're also compiling 64-bit code for the device.) Note that as of
77 v10.0 CUDA SDK `no longer supports compilation of 32-bit
78 applications <https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#deprecated-features>`_.
77 79
78 * ``<GPU arch>`` -- the `compute capability 80 * ``<GPU arch>`` -- the `compute capability
79 <https://developer.nvidia.com/cuda-gpus>`_ of your GPU. For example, if you 81 <https://developer.nvidia.com/cuda-gpus>`_ of your GPU. For example, if you
80 want to run your program on a GPU with compute capability of 3.5, specify 82 want to run your program on a GPU with compute capability of 3.5, specify
81 ``--cuda-gpu-arch=sm_35``. 83 ``--cuda-gpu-arch=sm_35``.
87 89
88 You can pass ``--cuda-gpu-arch`` multiple times to compile for multiple archs. 90 You can pass ``--cuda-gpu-arch`` multiple times to compile for multiple archs.
89 91
90 The `-L` and `-l` flags only need to be passed when linking. When compiling, 92 The `-L` and `-l` flags only need to be passed when linking. When compiling,
91 you may also need to pass ``--cuda-path=/path/to/cuda`` if you didn't install 93 you may also need to pass ``--cuda-path=/path/to/cuda`` if you didn't install
92 the CUDA SDK into ``/usr/local/cuda``, ``/usr/local/cuda-7.0``, or 94 the CUDA SDK into ``/usr/local/cuda`` or ``/usr/local/cuda-X.Y``.
93 ``/usr/local/cuda-7.5``.
94 95
95 Flags that control numerical code 96 Flags that control numerical code
96 --------------------------------- 97 ---------------------------------
97 98
98 If you're using GPUs, you probably care about making numerical code run fast. 99 If you're using GPUs, you probably care about making numerical code run fast.
140 141
141 ``<math.h>`` and ``<cmath>`` 142 ``<math.h>`` and ``<cmath>``
142 ---------------------------- 143 ----------------------------
143 144
144 In clang, ``math.h`` and ``cmath`` are available and `pass 145 In clang, ``math.h`` and ``cmath`` are available and `pass
145 <https://github.com/llvm-mirror/test-suite/blob/master/External/CUDA/math_h.cu>`_ 146 <https://github.com/llvm/llvm-test-suite/blob/master/External/CUDA/math_h.cu>`_
146 `tests 147 `tests
147 <https://github.com/llvm-mirror/test-suite/blob/master/External/CUDA/cmath.cu>`_ 148 <https://github.com/llvm/llvm-test-suite/blob/master/External/CUDA/cmath.cu>`_
148 adapted from libc++'s test suite. 149 adapted from libc++'s test suite.
149 150
150 In nvcc ``math.h`` and ``cmath`` are mostly available. Versions of ``::foof`` 151 In nvcc ``math.h`` and ``cmath`` are mostly available. Versions of ``::foof``
151 in namespace std (e.g. ``std::sinf``) are not available, and where the standard 152 in namespace std (e.g. ``std::sinf``) are not available, and where the standard
152 calls for overloads that take integral arguments, these are usually not 153 calls for overloads that take integral arguments, these are usually not
546 547
547 | `gpucc: An Open-Source GPGPU Compiler <http://dl.acm.org/citation.cfm?id=2854041>`_ 548 | `gpucc: An Open-Source GPGPU Compiler <http://dl.acm.org/citation.cfm?id=2854041>`_
548 | Jingyue Wu, Artem Belevich, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng, Robert Hundt 549 | Jingyue Wu, Artem Belevich, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng, Robert Hundt
549 | *Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO 2016)* 550 | *Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO 2016)*
550 | 551 |
551 | `Slides from the CGO talk <http://wujingyue.com/docs/gpucc-talk.pdf>`_ 552 | `Slides from the CGO talk <http://wujingyue.github.io/docs/gpucc-talk.pdf>`_
552 | 553 |
553 | `Tutorial given at CGO <http://wujingyue.com/docs/gpucc-tutorial.pdf>`_ 554 | `Tutorial given at CGO <http://wujingyue.github.io/docs/gpucc-tutorial.pdf>`_
554 555
555 Obtaining Help 556 Obtaining Help
556 ============== 557 ==============
557 558
558 To obtain help on LLVM in general and its CUDA support, see `the LLVM 559 To obtain help on LLVM in general and its CUDA support, see `the LLVM