view openmp/docs/remarks/OMP112.rst @ 266:00f31e85ec16 default tip

Added tag current for changeset 31d058e83c98
author Shinji KONO <kono@ie.u-ryukyu.ac.jp>
date Sat, 14 Oct 2023 10:13:55 +0900
parents c4bab56944e8
children
line wrap: on
line source

.. _omp112:

Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [OMP112]
=====================================================================================================

This missed remark indicates that a globalized value was found on the target
device that was not either replaced with stack memory by :ref:`OMP110 <omp110>`
or shared memory by :ref:`OMP111 <omp111>`. Globalization that has not been
removed will need to be handled by the runtime and will significantly impact
performance.

The OpenMP standard requires that threads are able to share their data between
each-other. However, this is not true by default when offloading to a target
device such as a GPU. Threads on a GPU cannot shared their data unless it is
first placed in global or shared memory. In order to create standards complaint
code, the Clang compiler will globalize any variables that could potentially be
shared between the threads. In the majority of cases, globalized variables can
either be returns to a thread-local stack, or pushed to shared memory. However,
in a few cases it is necessary and will cause a performance penalty.

Examples
--------

This example shows legitimate data sharing on the device. It is a convoluted
example, but is completely complaint with the OpenMP standard. If globalization
was not added this would result in different results on different target
devices.

.. code-block:: c++

  #include <omp.h>
  #include <cstdio>

  #pragma omp declare target
  static int *p;
  #pragma omp end declare target

  void foo() {
    int x = omp_get_thread_num();
    if (omp_get_thread_num() == 1)
      p = &x;

  #pragma omp barrier

    printf ("Thread %d: %d\n", omp_get_thread_num(), *p);
  }

  int main() {
  #pragma omp target parallel
    foo();
  }

.. code-block:: console

  $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
  omp112.cpp:9:7: remark: Found thread data sharing on the GPU. Expect degraded performance
  due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
  int x = omp_get_thread_num();
      ^

A less convoluted example globalization that cannot be removed occurs when
calling functions that aren't visible from the current translation unit.

.. code-block:: c++

  extern void use(int *x);

  void foo() {
    int x;
    use(&x);
  }

  int main() {
  #pragma omp target parallel
    foo();
  }

.. code-block:: console

  $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
  omp112.cpp:4:7: remark: Found thread data sharing on the GPU. Expect degraded performance
  due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
  int x;
      ^

Diagnostic Scope
----------------

OpenMP target offloading missed remark.