221
|
1 Support, Getting Involved, and FAQ
|
|
2 ==================================
|
|
3
|
236
|
4 Please do not hesitate to reach out to us on the `Discourse forums (Runtimes - OpenMP) <https://discourse.llvm.org/c/runtimes/openmp/35>`_ or join
|
221
|
5 one of our :ref:`regular calls <calls>`. Some common questions are answered in
|
|
6 the :ref:`faq`.
|
|
7
|
|
8 .. _calls:
|
|
9
|
|
10 Calls
|
|
11 -----
|
|
12
|
|
13 OpenMP in LLVM Technical Call
|
|
14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
15
|
|
16 - Development updates on OpenMP (and OpenACC) in the LLVM Project, including Clang, optimization, and runtime work.
|
|
17 - Join `OpenMP in LLVM Technical Call <https://bluejeans.com/544112769//webrtc>`__.
|
|
18 - Time: Weekly call on every Wednesday 7:00 AM Pacific time.
|
|
19 - Meeting minutes are `here <https://docs.google.com/document/d/1Tz8WFN13n7yJ-SCE0Qjqf9LmjGUw0dWO9Ts1ss4YOdg/edit>`__.
|
|
20 - Status tracking `page <https://openmp.llvm.org/docs>`__.
|
|
21
|
|
22
|
|
23 OpenMP in Flang Technical Call
|
|
24 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
25 - Development updates on OpenMP and OpenACC in the Flang Project.
|
|
26 - Join `OpenMP in Flang Technical Call <https://bit.ly/39eQW3o>`_
|
|
27 - Time: Weekly call on every Thursdays 8:00 AM Pacific time.
|
|
28 - Meeting minutes are `here <https://docs.google.com/document/d/1yA-MeJf6RYY-ZXpdol0t7YoDoqtwAyBhFLr5thu5pFI>`__.
|
|
29 - Status tracking `page <https://docs.google.com/spreadsheets/d/1FvHPuSkGbl4mQZRAwCIndvQx9dQboffiD-xD0oqxgU0/edit#gid=0>`__.
|
|
30
|
|
31
|
|
32 .. _faq:
|
|
33
|
|
34 FAQ
|
|
35 ---
|
|
36
|
|
37 .. note::
|
|
38 The FAQ is a work in progress and most of the expected content is not
|
|
39 yet available. While you can expect changes, we always welcome feedback and
|
236
|
40 additions. Please post on the `Discourse forums (Runtimes - OpenMP) <https://discourse.llvm.org/c/runtimes/openmp/35>`__.
|
221
|
41
|
|
42
|
|
43 Q: How to contribute a patch to the webpage or any other part?
|
|
44 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
45
|
|
46 All patches go through the regular `LLVM review process
|
|
47 <https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_.
|
|
48
|
|
49
|
|
50 .. _build_offload_capable_compiler:
|
|
51
|
|
52 Q: How to build an OpenMP GPU offload capable compiler?
|
|
53 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
54 To build an *effective* OpenMP offload capable compiler, only one extra CMake
|
|
55 option, `LLVM_ENABLE_RUNTIMES="openmp"`, is needed when building LLVM (Generic
|
223
|
56 information about building LLVM is available `here
|
|
57 <https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
|
|
58 are targeted by OpenMP to be enabled. By default, Clang will be built with all
|
|
59 backends enabled. When building with `LLVM_ENABLE_RUNTIMES="openmp"` OpenMP
|
|
60 should not be enabled in `LLVM_ENABLE_PROJECTS` because it is enabled by
|
|
61 default.
|
221
|
62
|
223
|
63 For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
|
|
64 For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
|
221
|
65
|
|
66 .. note::
|
|
67 The compiler that generates the offload code should be the same (version) as
|
|
68 the compiler that builds the OpenMP device runtimes. The OpenMP host runtime
|
|
69 can be built by a different compiler.
|
|
70
|
|
71 .. _advanced_builds: https://llvm.org//docs/AdvancedBuilds.html
|
|
72
|
|
73 .. _build_nvidia_offload_capable_compiler:
|
|
74
|
|
75 Q: How to build an OpenMP NVidia offload capable compiler?
|
|
76 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
77 The Cuda SDK is required on the machine that will execute the openmp application.
|
|
78
|
|
79 If your build machine is not the target machine or automatic detection of the
|
|
80 available GPUs failed, you should also set:
|
|
81
|
|
82 - `CLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_XX` where `XX` is the architecture of your GPU, e.g, 80.
|
|
83 - `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75.
|
|
84
|
|
85
|
|
86 .. _build_amdgpu_offload_capable_compiler:
|
|
87
|
|
88 Q: How to build an OpenMP AMDGPU offload capable compiler?
|
|
89 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
223
|
90 A subset of the `ROCm <https://github.com/radeonopencompute>`_ toolchain is
|
221
|
91 required to build the LLVM toolchain and to execute the openmp application.
|
|
92 Either install ROCm somewhere that cmake's find_package can locate it, or
|
|
93 build the required subcomponents ROCt and ROCr from source.
|
|
94
|
223
|
95 The two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime, rocr.
|
|
96 Roct is the userspace part of the linux driver. It calls into the driver which
|
|
97 ships with the linux kernel. It is an implementation detail of Rocr from
|
|
98 OpenMP's perspective. Rocr is an implementation of `HSA
|
|
99 <http://www.hsafoundation.com>`_.
|
|
100
|
|
101 .. code-block:: text
|
221
|
102
|
223
|
103 SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp
|
|
104 BUILD_DIR=somewhere
|
|
105 INSTALL_PREFIX=same-as-llvm-install
|
236
|
106
|
223
|
107 cd $SOURCE_DIR
|
236
|
108 git clone git@github.com:RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.2.x \
|
223
|
109 --single-branch
|
236
|
110 git clone git@github.com:RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.2.x \
|
223
|
111 --single-branch
|
236
|
112
|
223
|
113 cd $BUILD_DIR && mkdir roct && cd roct
|
|
114 cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \
|
|
115 -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
|
|
116 make && make install
|
221
|
117
|
223
|
118 cd $BUILD_DIR && mkdir rocr && cd rocr
|
|
119 cmake $SOURCE_DIR/ROCR-Runtime/src -DIMAGE_SUPPORT=OFF \
|
|
120 -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release \
|
|
121 -DBUILD_SHARED_LIBS=ON
|
|
122 make && make install
|
221
|
123
|
223
|
124 ``IMAGE_SUPPORT`` requires building rocr with clang and is not used by openmp.
|
236
|
125
|
221
|
126 Provided cmake's find_package can find the ROCR-Runtime package, LLVM will
|
223
|
127 build a tool ``bin/amdgpu-arch`` which will print a string like ``gfx906`` when
|
221
|
128 run if it recognises a GPU on the local system. LLVM will also build a shared
|
|
129 library, libomptarget.rtl.amdgpu.so, which is linked against rocr.
|
|
130
|
|
131 With those libraries installed, then LLVM build and installed, try:
|
|
132
|
223
|
133 .. code-block:: shell
|
|
134
|
221
|
135 clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
|
|
136
|
|
137 Q: What are the known limitations of OpenMP AMDGPU offload?
|
|
138 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
236
|
139 LD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and libomptarget.so
|
221
|
140
|
236
|
141 There is no libc. That is, malloc and printf do not exist. Libm is implemented in terms
|
|
142 of the rocm device library, which will be searched for if linking with '-lm'.
|
221
|
143
|
|
144 Some versions of the driver for the radeon vii (gfx906) will error unless the
|
|
145 environment variable 'export HSA_IGNORE_SRAMECC_MISREPORT=1' is set.
|
|
146
|
|
147 It is a recent addition to LLVM and the implementation differs from that which
|
|
148 has been shipping in ROCm and AOMP for some time. Early adopters will encounter
|
|
149 bugs.
|
|
150
|
236
|
151 Q: What are the LLVM components used in offloading and how are they found?
|
|
152 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
153 The libraries used by an executable compiled for target offloading are:
|
|
154
|
|
155 - ``libomp.so`` (or similar), the host openmp runtime
|
|
156 - ``libomptarget.so``, the target-agnostic target offloading openmp runtime
|
|
157 - plugins loaded by libomptarget.so:
|
|
158
|
|
159 - ``libomptarget.rtl.amdgpu.so``
|
|
160 - ``libomptarget.rtl.cuda.so``
|
|
161 - ``libomptarget.rtl.x86_64.so``
|
|
162 - ``libomptarget.rtl.ve.so``
|
|
163 - and others
|
|
164
|
|
165 - dependencies of those plugins, e.g. cuda/rocr for nvptx/amdgpu
|
|
166
|
|
167 The compiled executable is dynamically linked against a host runtime, e.g.
|
|
168 ``libomp.so``, and against the target offloading runtime, ``libomptarget.so``. These
|
|
169 are found like any other dynamic library, by setting rpath or runpath on the
|
|
170 executable, by setting ``LD_LIBRARY_PATH``, or by adding them to the system search.
|
|
171
|
|
172 ``libomptarget.so`` has rpath or runpath (whichever the system default is) set to
|
|
173 ``$ORIGIN``, and the plugins are located next to it, so it will find the plugins
|
|
174 without any environment variables set. If ``LD_LIBRARY_PATH`` is set, whether it
|
|
175 overrides which plugin is found depends on whether your system treats ``-Wl,-rpath``
|
|
176 as RPATH or RUNPATH.
|
|
177
|
|
178 The plugins will try to find their dependencies in plugin-dependent fashion.
|
|
179
|
|
180 The cuda plugin is dynamically linked against libcuda if cmake found it at
|
|
181 compiler build time. Otherwise it will attempt to dlopen ``libcuda.so``. It does
|
|
182 not have rpath set.
|
|
183
|
|
184 The amdgpu plugin is linked against ROCr if cmake found it at compiler build
|
|
185 time. Otherwise it will attempt to dlopen ``libhsa-runtime64.so``. It has rpath
|
|
186 set to ``$ORIGIN``, so installing ``libhsa-runtime64.so`` in the same directory is a
|
|
187 way to locate it without environment variables.
|
|
188
|
|
189 In addition to those, there is a compiler runtime library called deviceRTL.
|
|
190 This is compiled from mostly common code into an architecture specific
|
|
191 bitcode library, e.g. ``libomptarget-nvptx-sm_70.bc``.
|
|
192
|
|
193 Clang and the deviceRTL need to match closely as the interface between them
|
|
194 changes frequently. Using both from the same monorepo checkout is strongly
|
|
195 recommended.
|
|
196
|
|
197 Unlike the host side which lets environment variables select components, the
|
|
198 deviceRTL that is located in the clang lib directory is preferred. Only if
|
|
199 it is absent, the ``LIBRARY_PATH`` environment variable is searched to find a
|
|
200 bitcode file with the right name. This can be overridden by passing a clang
|
|
201 flag, ``--libomptarget-nvptx-bc-path`` or ``--libomptarget-amdgcn-bc-path``. That
|
|
202 can specify a directory or an exact bitcode file to use.
|
|
203
|
|
204
|
221
|
205 Q: Does OpenMP offloading support work in pre-packaged LLVM releases?
|
|
206 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
207 For now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`.
|
|
208
|
|
209 Q: Does OpenMP offloading support work in packages distributed as part of my OS?
|
|
210 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
211 For now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`.
|
|
212
|
|
213
|
|
214 .. _math_and_complex_in_target_regions:
|
|
215
|
|
216 Q: Does Clang support `<math.h>` and `<complex.h>` operations in OpenMP target on GPUs?
|
|
217 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
218
|
223
|
219 Yes, LLVM/Clang allows math functions and complex arithmetic inside of OpenMP
|
|
220 target regions that are compiled for GPUs.
|
221
|
221
|
|
222 Clang provides a set of wrapper headers that are found first when `math.h` and
|
|
223 `complex.h`, for C, `cmath` and `complex`, for C++, or similar headers are
|
|
224 included by the application. These wrappers will eventually include the system
|
|
225 version of the corresponding header file after setting up a target device
|
|
226 specific environment. The fact that the system header is included is important
|
|
227 because they differ based on the architecture and operating system and may
|
|
228 contain preprocessor, variable, and function definitions that need to be
|
|
229 available in the target region regardless of the targeted device architecture.
|
|
230 However, various functions may require specialized device versions, e.g.,
|
|
231 `sin`, and others are only available on certain devices, e.g., `__umul64hi`. To
|
|
232 provide "native" support for math and complex on the respective architecture,
|
|
233 Clang will wrap the "native" math functions, e.g., as provided by the device
|
|
234 vendor, in an OpenMP begin/end declare variant. These functions will then be
|
|
235 picked up instead of the host versions while host only variables and function
|
|
236 definitions are still available. Complex arithmetic and functions are support
|
|
237 through a similar mechanism. It is worth noting that this support requires
|
|
238 `extensions to the OpenMP begin/end declare variant context selector
|
|
239 <https://clang.llvm.org/docs/AttributeReference.html#pragma-omp-declare-variant>`__
|
|
240 that are exposed through LLVM/Clang to the user as well.
|
|
241
|
|
242 Q: What is a way to debug errors from mapping memory to a target device?
|
|
243 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
244
|
236
|
245 An experimental way to debug these errors is to use :ref:`remote process
|
221
|
246 offloading <remote_offloading_plugin>`.
|
|
247 By using ``libomptarget.rtl.rpc.so`` and ``openmp-offloading-server``, it is
|
|
248 possible to explicitly perform memory transfers between processes on the host
|
|
249 CPU and run sanitizers while doing so in order to catch these errors.
|
|
250
|
|
251 Q: Why does my application say "Named symbol not found" and abort when I run it?
|
|
252 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
253
|
|
254 This is most likely caused by trying to use OpenMP offloading with static
|
|
255 libraries. Static libraries do not contain any device code, so when the runtime
|
|
256 attempts to execute the target region it will not be found and you will get an
|
|
257 an error like this.
|
|
258
|
|
259 .. code-block:: text
|
|
260
|
|
261 CUDA error: Loading '__omp_offloading_fd02_3231c15__Z3foov_l2' Failed
|
|
262 CUDA error: named symbol not found
|
|
263 Libomptarget error: Unable to generate entries table for device id 0.
|
|
264
|
|
265 Currently, the only solution is to change how the application is built and avoid
|
|
266 the use of static libraries.
|
|
267
|
223
|
268 Q: Can I use dynamically linked libraries with OpenMP offloading?
|
|
269 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
221
|
270
|
|
271 Dynamically linked libraries can be only used if there is no device code split
|
|
272 between the library and application. Anything declared on the device inside the
|
|
273 shared library will not be visible to the application when it's linked.
|
223
|
274
|
|
275 Q: How to build an OpenMP offload capable compiler with an outdated host compiler?
|
|
276 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
277
|
|
278 Enabling the OpenMP runtime will perform a two-stage build for you.
|
|
279 If your host compiler is different from your system-wide compiler, you may need
|
|
280 to set the CMake variable `GCC_INSTALL_PREFIX` so clang will be able to find the
|
|
281 correct GCC toolchain in the second stage of the build.
|
|
282
|
|
283 For example, if your system-wide GCC installation is too old to build LLVM and
|
|
284 you would like to use a newer GCC, set the CMake variable `GCC_INSTALL_PREFIX`
|
|
285 to inform clang of the GCC installation you would like to use in the second stage.
|
|
286
|
|
287 Q: How can I include OpenMP offloading support in my CMake project?
|
|
288 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
289
|
|
290 Currently, there is an experimental CMake find module for OpenMP target
|
|
291 offloading provided by LLVM. It will attempt to find OpenMP target offloading
|
|
292 support for your compiler. The flags necessary for OpenMP target offloading will
|
|
293 be loaded into the ``OpenMPTarget::OpenMPTarget_<device>`` target or the
|
|
294 ``OpenMPTarget_<device>_FLAGS`` variable if successful. Currently supported
|
236
|
295 devices are ``AMDGPU`` and ``NVPTX``.
|
223
|
296
|
|
297 To use this module, simply add the path to CMake's current module path and call
|
|
298 ``find_package``. The module will be installed with your OpenMP installation by
|
|
299 default. Including OpenMP offloading support in an application should now only
|
|
300 require a few additions.
|
|
301
|
|
302 .. code-block:: cmake
|
|
303
|
|
304 cmake_minimum_required(VERSION 3.13.4)
|
|
305 project(offloadTest VERSION 1.0 LANGUAGES CXX)
|
236
|
306
|
223
|
307 list(APPEND CMAKE_MODULE_PATH "${PATH_TO_OPENMP_INSTALL}/lib/cmake/openmp")
|
236
|
308
|
223
|
309 find_package(OpenMPTarget REQUIRED NVPTX)
|
236
|
310
|
223
|
311 add_executable(offload)
|
|
312 target_link_libraries(offload PRIVATE OpenMPTarget::OpenMPTarget_NVPTX)
|
|
313 target_sources(offload PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/Main.cpp)
|
|
314
|
|
315 Using this module requires at least CMake version 3.13.4. Supported languages
|
|
316 are C and C++ with Fortran support planned in the future. Compiler support is
|
|
317 best for Clang but this module should work for other compiler vendors such as
|
|
318 IBM, GNU.
|
236
|
319
|
|
320 Q: What does 'Stack size for entry function cannot be statically determined' mean?
|
|
321 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
322
|
|
323 This is a warning that the Nvidia tools will sometimes emit if the offloading
|
|
324 region is too complex. Normally, the CUDA tools attempt to statically determine
|
|
325 how much stack memory each thread. This way when the kernel is launched each
|
|
326 thread will have as much memory as it needs. If the control flow of the kernel
|
|
327 is too complex, containing recursive calls or nested parallelism, this analysis
|
|
328 can fail. If this warning is triggered it means that the kernel may run out of
|
|
329 stack memory during execution and crash. The environment variable
|
|
330 ``LIBOMPTARGET_STACK_SIZE`` can be used to increase the stack size if this
|
|
331 occurs.
|
|
332
|
|
333 Q: Can OpenMP offloading compile for multiple architectures?
|
|
334 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
335
|
|
336 Since LLVM version 15.0, OpenMP offloading supports offloading to multiple
|
|
337 architectures at once. This allows for executables to be run on different
|
|
338 targets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as
|
|
339 multiple sub-architectures for the same target. Additionally, static libraries
|
|
340 will only extract archive members if an architecture is used, allowing users to
|
|
341 create generic libraries.
|
|
342
|
|
343 The architecture can either be specified manually using ``--offload-arch=``. If
|
|
344 ``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
|
|
345 targets will be inferred from the architectures. Conversely, if
|
|
346 ``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
|
|
347 architecture will be set to a default value, usually the architecture supported
|
|
348 by the system LLVM was built on.
|
|
349
|
|
350 For example, an executable can be built that runs on AMDGPU and NVIDIA hardware
|
|
351 given that the necessary build tools are installed for both.
|
|
352
|
|
353 .. code-block:: shell
|
|
354
|
|
355 clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80
|
|
356
|
|
357 If just given the architectures we should be able to infer the triples,
|
|
358 otherwise we can specify them manually.
|
|
359
|
|
360 .. code-block:: shell
|
|
361
|
|
362 clang example.c -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
|
|
363 -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a \
|
|
364 -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_80
|
|
365
|
|
366 When linking against a static library that contains device code for multiple
|
|
367 architectures, only the images used by the executable will be extracted.
|
|
368
|
|
369 .. code-block:: shell
|
|
370
|
|
371 clang example.c -fopenmp --offload-arch=gfx90a,gfx90a,sm_70,sm_80 -c
|
|
372 llvm-ar rcs libexample.a example.o
|
|
373 clang app.c -fopenmp --offload-arch=gfx90a -o app
|
|
374
|
|
375 The supported device images can be viewed using the ``--offloading`` option with
|
|
376 ``llvm-objdump``.
|
|
377
|
|
378 .. code-block:: shell
|
|
379
|
|
380 clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80 -o example
|
|
381 llvm-objdump --offloading example
|
|
382
|
|
383 a.out: file format elf64-x86-64
|
|
384
|
|
385 OFFLOADING IMAGE [0]:
|
|
386 kind elf
|
|
387 arch gfx90a
|
|
388 triple amdgcn-amd-amdhsa
|
|
389 producer openmp
|
|
390
|
|
391 OFFLOADING IMAGE [1]:
|
|
392 kind elf
|
|
393 arch sm_80
|
|
394 triple nvptx64-nvidia-cuda
|
|
395 producer openmp
|
|
396
|
|
397 Q: Can I link OpenMP offloading with CUDA or HIP?
|
|
398 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
399
|
|
400 OpenMP offloading files can currently be experimentally linked with CUDA and HIP
|
|
401 files. This will allow OpenMP to call a CUDA device function or vice-versa.
|
|
402 However, the global state will be distinct between the two images at runtime.
|
|
403 This means any global variables will potentially have different values when
|
|
404 queried from OpenMP or CUDA.
|
|
405
|
|
406 Linking CUDA and HIP currently requires enabling a different compilation mode
|
|
407 for CUDA / HIP with ``--offload-new-driver`` and to link using
|
|
408 ``--offload-link``. Additionally, ``-fgpu-rdc`` must be used to create a
|
|
409 linkable device image.
|
|
410
|
|
411 .. code-block:: shell
|
|
412
|
|
413 clang++ openmp.cpp -fopenmp --offload-arch=sm_80 -c
|
|
414 clang++ cuda.cu --offload-new-driver --offload-arch=sm_80 -fgpu-rdc -c
|
|
415 clang++ openmp.o cuda.o --offload-link -o app
|