223
|
1 Moving globalized variable to the stack. [OMP110]
|
|
2 =================================================
|
|
3
|
|
4 .. _omp110:
|
|
5
|
|
6 This optimization remark indicates that a globalized variable was moved back to
|
|
7 thread-local stack memory on the device. This occurs when the optimization pass
|
|
8 can determine that a globalized variable is not possibly be shared between
|
|
9 threads and globalization was unnecessary. Using stack memory is the best-case
|
|
10 scenario for data globalization as the variable can now be stored in fast
|
|
11 register files on the device. This optimization requires full visibility of each
|
|
12 variable.
|
|
13
|
|
14 Globalization typically occurs when a pointer to a thread-local variable escapes
|
|
15 the current scope. The compiler needs to be pessimistic and assume that the
|
|
16 pointer could be shared between multiple threads according to the OpenMP
|
|
17 standard. This is expensive on target offloading devices that do not allow
|
|
18 threads to share data by default. Instead, this data must be moved to memory
|
|
19 that can be shared, such as shared or global memory. This optimization moves the
|
|
20 data back from shared or global memory to thread-local stack memory if the data
|
|
21 is not actually shared between the threads.
|
|
22
|
|
23 Examples
|
|
24 --------
|
|
25
|
|
26 A trivial example of globalization occurring can be seen with this example. The
|
|
27 compiler sees that a pointer to the thread-local variable ``x`` escapes the
|
|
28 current scope and must globalize it even though it is not actually necessary.
|
|
29 Fortunately, this optimization can undo this by looking at its usage.
|
|
30
|
|
31 .. code-block:: c++
|
|
32
|
|
33 void use(int *x) { }
|
|
34
|
|
35 void foo() {
|
|
36 int x;
|
|
37 use(&x);
|
|
38 }
|
|
39
|
|
40 int main() {
|
|
41 #pragma omp target parallel
|
|
42 foo();
|
|
43 }
|
|
44
|
|
45 .. code-block:: console
|
|
46
|
|
47 $ clang++ -fopenmp -fopenmp-targets=nvptx64 omp110.cpp -O1 -Rpass=openmp-opt
|
|
48 omp110.cpp:6:7: remark: Moving globalized variable to the stack. [OMP110]
|
|
49 int x;
|
|
50 ^
|
|
51
|
|
52 A less trivial example can be seen using C++'s complex numbers. In this case the
|
|
53 overloaded arithmetic operators cause pointers to the complex numbers to escape
|
|
54 the current scope, but they can again be removed once the usage is visible.
|
|
55
|
|
56 .. code-block:: c++
|
|
57
|
|
58 #include <complex>
|
|
59
|
|
60 using complex = std::complex<double>;
|
|
61
|
|
62 void zaxpy(complex *X, complex *Y, const complex D, int N) {
|
|
63 #pragma omp target teams distribute parallel for firstprivate(D)
|
|
64 for (int i = 0; i < N; ++i)
|
|
65 Y[i] = D * X[i] + Y[i];
|
|
66 }
|
|
67
|
|
68 .. code-block:: console
|
|
69
|
|
70 $ clang++ -fopenmp -fopenmp-targets=nvptx64 omp110.cpp -O1 -Rpass=openmp-opt
|
|
71 In file included from omp110.cpp:1:
|
|
72 In file included from /usr/bin/clang/lib/clang/13.0.0/include/openmp_wrappers/complex:27:
|
|
73 /usr/include/c++/8/complex:328:20: remark: Moving globalized variable to the stack. [OMP110]
|
|
74 complex<_Tp> __r = __x;
|
|
75 ^
|
|
76 /usr/include/c++/8/complex:388:20: remark: Moving globalized variable to the stack. [OMP110]
|
|
77 complex<_Tp> __r = __x;
|
|
78 ^
|
|
79
|
|
80 Diagnostic Scope
|
|
81 ----------------
|
|
82
|
|
83 OpenMP target offloading optimization remark.
|