Mercurial > hg > CbC > CbC_llvm
annotate openmp/docs/remarks/OMP110.rst @ 232:70dce7da266c llvm-original
llvm original Jul 20 16:41:34 2021
author | Shinji KONO <kono@ie.u-ryukyu.ac.jp> |
---|---|
date | Wed, 21 Jul 2021 10:27:27 +0900 |
parents | 5f17cb93ff66 |
children | c4bab56944e8 |
rev | line source |
---|---|
232
70dce7da266c
llvm original Jul 20 16:41:34 2021
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
223
diff
changeset
|
1 .. _omp110: |
70dce7da266c
llvm original Jul 20 16:41:34 2021
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
223
diff
changeset
|
2 |
223 | 3 Moving globalized variable to the stack. [OMP110] |
4 ================================================= | |
5 | |
6 This optimization remark indicates that a globalized variable was moved back to | |
7 thread-local stack memory on the device. This occurs when the optimization pass | |
8 can determine that a globalized variable is not possibly be shared between | |
9 threads and globalization was unnecessary. Using stack memory is the best-case | |
10 scenario for data globalization as the variable can now be stored in fast | |
11 register files on the device. This optimization requires full visibility of each | |
12 variable. | |
13 | |
14 Globalization typically occurs when a pointer to a thread-local variable escapes | |
15 the current scope. The compiler needs to be pessimistic and assume that the | |
16 pointer could be shared between multiple threads according to the OpenMP | |
17 standard. This is expensive on target offloading devices that do not allow | |
18 threads to share data by default. Instead, this data must be moved to memory | |
19 that can be shared, such as shared or global memory. This optimization moves the | |
20 data back from shared or global memory to thread-local stack memory if the data | |
21 is not actually shared between the threads. | |
22 | |
23 Examples | |
24 -------- | |
25 | |
26 A trivial example of globalization occurring can be seen with this example. The | |
27 compiler sees that a pointer to the thread-local variable ``x`` escapes the | |
28 current scope and must globalize it even though it is not actually necessary. | |
29 Fortunately, this optimization can undo this by looking at its usage. | |
30 | |
31 .. code-block:: c++ | |
32 | |
33 void use(int *x) { } | |
34 | |
35 void foo() { | |
36 int x; | |
37 use(&x); | |
38 } | |
39 | |
40 int main() { | |
41 #pragma omp target parallel | |
42 foo(); | |
43 } | |
44 | |
45 .. code-block:: console | |
46 | |
47 $ clang++ -fopenmp -fopenmp-targets=nvptx64 omp110.cpp -O1 -Rpass=openmp-opt | |
48 omp110.cpp:6:7: remark: Moving globalized variable to the stack. [OMP110] | |
49 int x; | |
50 ^ | |
51 | |
52 A less trivial example can be seen using C++'s complex numbers. In this case the | |
53 overloaded arithmetic operators cause pointers to the complex numbers to escape | |
54 the current scope, but they can again be removed once the usage is visible. | |
55 | |
56 .. code-block:: c++ | |
57 | |
58 #include <complex> | |
59 | |
60 using complex = std::complex<double>; | |
61 | |
62 void zaxpy(complex *X, complex *Y, const complex D, int N) { | |
63 #pragma omp target teams distribute parallel for firstprivate(D) | |
64 for (int i = 0; i < N; ++i) | |
65 Y[i] = D * X[i] + Y[i]; | |
66 } | |
67 | |
68 .. code-block:: console | |
69 | |
70 $ clang++ -fopenmp -fopenmp-targets=nvptx64 omp110.cpp -O1 -Rpass=openmp-opt | |
71 In file included from omp110.cpp:1: | |
72 In file included from /usr/bin/clang/lib/clang/13.0.0/include/openmp_wrappers/complex:27: | |
73 /usr/include/c++/8/complex:328:20: remark: Moving globalized variable to the stack. [OMP110] | |
74 complex<_Tp> __r = __x; | |
75 ^ | |
76 /usr/include/c++/8/complex:388:20: remark: Moving globalized variable to the stack. [OMP110] | |
77 complex<_Tp> __r = __x; | |
78 ^ | |
79 | |
80 Diagnostic Scope | |
81 ---------------- | |
82 | |
83 OpenMP target offloading optimization remark. |