0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
1 ==============================================
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
2 LLVM Atomic Instructions and Concurrency Guide
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
3 ==============================================
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
4
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
5 .. contents::
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
6 :local:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
7
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
8 Introduction
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
9 ============
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
10
|
120
|
11 LLVM supports instructions which are well-defined in the presence of threads and
|
|
12 asynchronous signals.
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
13
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
14 The atomic instructions are designed specifically to provide readable IR and
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
15 optimized code generation for the following:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
16
|
120
|
17 * The C++11 ``<atomic>`` header. (`C++11 draft available here
|
83
|
18 <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
19 <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
20
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
21 * Proper semantics for Java-style memory, for both ``volatile`` and regular
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
22 shared variables. (`Java Specification
|
77
|
23 <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
24
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
25 * gcc-compatible ``__sync_*`` builtins. (`Description
|
77
|
26 <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_)
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
27
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
28 * Other scenarios with atomic semantics, including ``static`` variables with
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
29 non-trivial constructors in C++.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
30
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
31 Atomic and volatile in the IR are orthogonal; "volatile" is the C/C++ volatile,
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
32 which ensures that every volatile load and store happens and is performed in the
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
33 stated order. A couple examples: if a SequentiallyConsistent store is
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
34 immediately followed by another SequentiallyConsistent store to the same
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
35 address, the first store can be erased. This transformation is not allowed for a
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
36 pair of volatile stores. On the other hand, a non-volatile non-atomic load can
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
37 be moved across a volatile load freely, but not an Acquire load.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
38
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
39 This document is intended to provide a guide to anyone either writing a frontend
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
40 for LLVM or working on optimization passes for LLVM with a guide for how to deal
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
41 with instructions with special semantics in the presence of concurrency. This
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
42 is not intended to be a precise guide to the semantics; the details can get
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
43 extremely complicated and unreadable, and are not usually necessary.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
44
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
45 .. _Optimization outside atomic:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
46
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
47 Optimization outside atomic
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
48 ===========================
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
49
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
50 The basic ``'load'`` and ``'store'`` allow a variety of optimizations, but can
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
51 lead to undefined results in a concurrent environment; see `NotAtomic`_. This
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
52 section specifically goes into the one optimizer restriction which applies in
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
53 concurrent environments, which gets a bit more of an extended description
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
54 because any optimization dealing with stores needs to be aware of it.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
55
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
56 From the optimizer's point of view, the rule is that if there are not any
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
57 instructions with atomic ordering involved, concurrency does not matter, with
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
58 one exception: if a variable might be visible to another thread or signal
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
59 handler, a store cannot be inserted along a path where it might not execute
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
60 otherwise. Take the following example:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
61
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
62 .. code-block:: c
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
63
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
64 /* C code, for readability; run through clang -O2 -S -emit-llvm to get
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
65 equivalent IR */
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
66 int x;
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
67 void f(int* a) {
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
68 for (int i = 0; i < 100; i++) {
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
69 if (a[i])
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
70 x += 1;
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
71 }
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
72 }
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
73
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
74 The following is equivalent in non-concurrent situations:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
75
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
76 .. code-block:: c
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
77
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
78 int x;
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
79 void f(int* a) {
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
80 int xtemp = x;
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
81 for (int i = 0; i < 100; i++) {
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
82 if (a[i])
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
83 xtemp += 1;
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
84 }
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
85 x = xtemp;
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
86 }
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
87
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
88 However, LLVM is not allowed to transform the former to the latter: it could
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
89 indirectly introduce undefined behavior if another thread can access ``x`` at
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
90 the same time. (This example is particularly of interest because before the
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
91 concurrency model was implemented, LLVM would perform this transformation.)
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
92
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
93 Note that speculative loads are allowed; a load which is part of a race returns
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
94 ``undef``, but does not have undefined behavior.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
95
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
96 Atomic instructions
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
97 ===================
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
98
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
99 For cases where simple loads and stores are not sufficient, LLVM provides
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
100 various atomic instructions. The exact guarantees provided depend on the
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
101 ordering; see `Atomic orderings`_.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
102
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
103 ``load atomic`` and ``store atomic`` provide the same basic functionality as
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
104 non-atomic loads and stores, but provide additional guarantees in situations
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
105 where threads and signals are involved.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
106
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
107 ``cmpxchg`` and ``atomicrmw`` are essentially like an atomic load followed by an
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
108 atomic store (where the store is conditional for ``cmpxchg``), but no other
|
77
|
109 memory operation can happen on any thread between the load and store.
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
110
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
111 A ``fence`` provides Acquire and/or Release ordering which is not part of
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
112 another operation; it is normally used along with Monotonic memory operations.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
113 A Monotonic load followed by an Acquire fence is roughly equivalent to an
|
83
|
114 Acquire load, and a Monotonic store following a Release fence is roughly
|
|
115 equivalent to a Release store. SequentiallyConsistent fences behave as both
|
|
116 an Acquire and a Release fence, and offer some additional complicated
|
|
117 guarantees, see the C++11 standard for details.
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
118
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
119 Frontends generating atomic instructions generally need to be aware of the
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
120 target to some degree; atomic instructions are guaranteed to be lock-free, and
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
121 therefore an instruction which is wider than the target natively supports can be
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
122 impossible to generate.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
123
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
124 .. _Atomic orderings:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
125
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
126 Atomic orderings
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
127 ================
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
128
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
129 In order to achieve a balance between performance and necessary guarantees,
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
130 there are six levels of atomicity. They are listed in order of strength; each
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
131 level includes all the guarantees of the previous level except for
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
132 Acquire/Release. (See also `LangRef Ordering <LangRef.html#ordering>`_.)
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
133
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
134 .. _NotAtomic:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
135
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
136 NotAtomic
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
137 ---------
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
138
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
139 NotAtomic is the obvious, a load or store which is not atomic. (This isn't
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
140 really a level of atomicity, but is listed here for comparison.) This is
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
141 essentially a regular load or store. If there is a race on a given memory
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
142 location, loads from that location return undef.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
143
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
144 Relevant standard
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
145 This is intended to match shared variables in C/C++, and to be used in any
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
146 other context where memory access is necessary, and a race is impossible. (The
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
147 precise definition is in `LangRef Memory Model <LangRef.html#memmodel>`_.)
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
148
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
149 Notes for frontends
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
150 The rule is essentially that all memory accessed with basic loads and stores
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
151 by multiple threads should be protected by a lock or other synchronization;
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
152 otherwise, you are likely to run into undefined behavior. If your frontend is
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
153 for a "safe" language like Java, use Unordered to load and store any shared
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
154 variable. Note that NotAtomic volatile loads and stores are not properly
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
155 atomic; do not try to use them as a substitute. (Per the C/C++ standards,
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
156 volatile does provide some limited guarantees around asynchronous signals, but
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
157 atomics are generally a better solution.)
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
158
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
159 Notes for optimizers
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
160 Introducing loads to shared variables along a codepath where they would not
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
161 otherwise exist is allowed; introducing stores to shared variables is not. See
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
162 `Optimization outside atomic`_.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
163
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
164 Notes for code generation
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
165 The one interesting restriction here is that it is not allowed to write to
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
166 bytes outside of the bytes relevant to a store. This is mostly relevant to
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
167 unaligned stores: it is not allowed in general to convert an unaligned store
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
168 into two aligned stores of the same width as the unaligned store. Backends are
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
169 also expected to generate an i8 store as an i8 store, and not an instruction
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
170 which writes to surrounding bytes. (If you are writing a backend for an
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
171 architecture which cannot satisfy these restrictions and cares about
|
95
|
172 concurrency, please send an email to llvm-dev.)
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
173
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
174 Unordered
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
175 ---------
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
176
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
177 Unordered is the lowest level of atomicity. It essentially guarantees that races
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
178 produce somewhat sane results instead of having undefined behavior. It also
|
83
|
179 guarantees the operation to be lock-free, so it does not depend on the data
|
|
180 being part of a special atomic structure or depend on a separate per-process
|
|
181 global lock. Note that code generation will fail for unsupported atomic
|
|
182 operations; if you need such an operation, use explicit locking.
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
183
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
184 Relevant standard
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
185 This is intended to match the Java memory model for shared variables.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
186
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
187 Notes for frontends
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
188 This cannot be used for synchronization, but is useful for Java and other
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
189 "safe" languages which need to guarantee that the generated code never
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
190 exhibits undefined behavior. Note that this guarantee is cheap on common
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
191 platforms for loads of a native width, but can be expensive or unavailable for
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
192 wider loads, like a 64-bit store on ARM. (A frontend for Java or other "safe"
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
193 languages would normally split a 64-bit store on ARM into two 32-bit unordered
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
194 stores.)
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
195
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
196 Notes for optimizers
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
197 In terms of the optimizer, this prohibits any transformation that transforms a
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
198 single load into multiple loads, transforms a store into multiple stores,
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
199 narrows a store, or stores a value which would not be stored otherwise. Some
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
200 examples of unsafe optimizations are narrowing an assignment into a bitfield,
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
201 rematerializing a load, and turning loads and stores into a memcpy
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
202 call. Reordering unordered operations is safe, though, and optimizers should
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
203 take advantage of that because unordered operations are common in languages
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
204 that need them.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
205
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
206 Notes for code generation
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
207 These operations are required to be atomic in the sense that if you use
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
208 unordered loads and unordered stores, a load cannot see a value which was
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
209 never stored. A normal load or store instruction is usually sufficient, but
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
210 note that an unordered load or store cannot be split into multiple
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
211 instructions (or an instruction which does multiple memory operations, like
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
212 ``LDRD`` on ARM without LPAE, or not naturally-aligned ``LDRD`` on LPAE ARM).
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
213
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
214 Monotonic
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
215 ---------
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
216
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
217 Monotonic is the weakest level of atomicity that can be used in synchronization
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
218 primitives, although it does not provide any general synchronization. It
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
219 essentially guarantees that if you take all the operations affecting a specific
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
220 address, a consistent ordering exists.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
221
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
222 Relevant standard
|
83
|
223 This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
224 standards for the exact definition.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
225
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
226 Notes for frontends
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
227 If you are writing a frontend which uses this directly, use with caution. The
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
228 guarantees in terms of synchronization are very weak, so make sure these are
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
229 only used in a pattern which you know is correct. Generally, these would
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
230 either be used for atomic operations which do not protect other memory (like
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
231 an atomic counter), or along with a ``fence``.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
232
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
233 Notes for optimizers
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
234 In terms of the optimizer, this can be treated as a read+write on the relevant
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
235 memory location (and alias analysis will take advantage of that). In addition,
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
236 it is legal to reorder non-atomic and Unordered loads around Monotonic
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
237 loads. CSE/DSE and a few other optimizations are allowed, but Monotonic
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
238 operations are unlikely to be used in ways which would make those
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
239 optimizations useful.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
240
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
241 Notes for code generation
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
242 Code generation is essentially the same as that for unordered for loads and
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
243 stores. No fences are required. ``cmpxchg`` and ``atomicrmw`` are required
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
244 to appear as a single operation.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
245
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
246 Acquire
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
247 -------
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
248
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
249 Acquire provides a barrier of the sort necessary to acquire a lock to access
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
250 other memory with normal loads and stores.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
251
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
252 Relevant standard
|
83
|
253 This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
|
|
254 used for C++11/C11 ``memory_order_consume``.
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
255
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
256 Notes for frontends
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
257 If you are writing a frontend which uses this directly, use with caution.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
258 Acquire only provides a semantic guarantee when paired with a Release
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
259 operation.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
260
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
261 Notes for optimizers
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
262 Optimizers not aware of atomics can treat this like a nothrow call. It is
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
263 also possible to move stores from before an Acquire load or read-modify-write
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
264 operation to after it, and move non-Acquire loads from before an Acquire
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
265 operation to after it.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
266
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
267 Notes for code generation
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
268 Architectures with weak memory ordering (essentially everything relevant today
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
269 except x86 and SPARC) require some sort of fence to maintain the Acquire
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
270 semantics. The precise fences required varies widely by architecture, but for
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
271 a simple implementation, most architectures provide a barrier which is strong
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
272 enough for everything (``dmb`` on ARM, ``sync`` on PowerPC, etc.). Putting
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
273 such a fence after the equivalent Monotonic operation is sufficient to
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
274 maintain Acquire semantics for a memory operation.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
275
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
276 Release
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
277 -------
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
278
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
279 Release is similar to Acquire, but with a barrier of the sort necessary to
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
280 release a lock.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
281
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
282 Relevant standard
|
83
|
283 This corresponds to the C++11/C11 ``memory_order_release``.
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
284
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
285 Notes for frontends
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
286 If you are writing a frontend which uses this directly, use with caution.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
287 Release only provides a semantic guarantee when paired with a Acquire
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
288 operation.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
289
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
290 Notes for optimizers
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
291 Optimizers not aware of atomics can treat this like a nothrow call. It is
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
292 also possible to move loads from after a Release store or read-modify-write
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
293 operation to before it, and move non-Release stores from after an Release
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
294 operation to before it.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
295
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
296 Notes for code generation
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
297 See the section on Acquire; a fence before the relevant operation is usually
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
298 sufficient for Release. Note that a store-store fence is not sufficient to
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
299 implement Release semantics; store-store fences are generally not exposed to
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
300 IR because they are extremely difficult to use correctly.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
301
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
302 AcquireRelease
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
303 --------------
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
304
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
305 AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
306 barrier (for fences and operations which both read and write memory).
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
307
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
308 Relevant standard
|
83
|
309 This corresponds to the C++11/C11 ``memory_order_acq_rel``.
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
310
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
311 Notes for frontends
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
312 If you are writing a frontend which uses this directly, use with caution.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
313 Acquire only provides a semantic guarantee when paired with a Release
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
314 operation, and vice versa.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
315
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
316 Notes for optimizers
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
317 In general, optimizers should treat this like a nothrow call; the possible
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
318 optimizations are usually not interesting.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
319
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
320 Notes for code generation
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
321 This operation has Acquire and Release semantics; see the sections on Acquire
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
322 and Release.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
323
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
324 SequentiallyConsistent
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
325 ----------------------
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
326
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
327 SequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
328 and Release semantics for stores. Additionally, it guarantees that a total
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
329 ordering exists between all SequentiallyConsistent operations.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
330
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
331 Relevant standard
|
83
|
332 This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
333 the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
334
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
335 Notes for frontends
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
336 If a frontend is exposing atomic operations, these are much easier to reason
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
337 about for the programmer than other kinds of operations, and using them is
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
338 generally a practical performance tradeoff.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
339
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
340 Notes for optimizers
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
341 Optimizers not aware of atomics can treat this like a nothrow call. For
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
342 SequentiallyConsistent loads and stores, the same reorderings are allowed as
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
343 for Acquire loads and Release stores, except that SequentiallyConsistent
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
344 operations may not be reordered.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
345
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
346 Notes for code generation
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
347 SequentiallyConsistent loads minimally require the same barriers as Acquire
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
348 operations and SequentiallyConsistent stores require Release
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
349 barriers. Additionally, the code generator must enforce ordering between
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
350 SequentiallyConsistent stores followed by SequentiallyConsistent loads. This
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
351 is usually done by emitting either a full fence before the loads or a full
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
352 fence after the stores; which is preferred varies by architecture.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
353
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
354 Atomics and IR optimization
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
355 ===========================
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
356
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
357 Predicates for optimizer writers to query:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
358
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
359 * ``isSimple()``: A load or store which is not volatile or atomic. This is
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
360 what, for example, memcpyopt would check for operations it might transform.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
361
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
362 * ``isUnordered()``: A load or store which is not volatile and at most
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
363 Unordered. This would be checked, for example, by LICM before hoisting an
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
364 operation.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
365
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
366 * ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
367 that they return true for any operation which is volatile or at least
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
368 Monotonic.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
369
|
120
|
370 * ``isStrongerThan`` / ``isAtLeastOrStrongerThan``: These are predicates on
|
83
|
371 orderings. They can be useful for passes that are aware of atomics, for
|
|
372 example to do DSE across a single atomic access, but not across a
|
|
373 release-acquire pair (see MemoryDependencyAnalysis for an example of this)
|
|
374
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
375 * Alias analysis: Note that AA will return ModRef for anything Acquire or
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
376 Release, and for the address accessed by any Monotonic operation.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
377
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
378 To support optimizing around atomic operations, make sure you are using the
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
379 right predicates; everything should work if that is done. If your pass should
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
380 optimize some atomic operations (Unordered operations in particular), make sure
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
381 it doesn't replace an atomic load or store with a non-atomic operation.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
382
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
383 Some examples of how optimizations interact with various kinds of atomic
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
384 operations:
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
385
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
386 * ``memcpyopt``: An atomic operation cannot be optimized into part of a
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
387 memcpy/memset, including unordered loads/stores. It can pull operations
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
388 across some atomic operations.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
389
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
390 * LICM: Unordered loads/stores can be moved out of a loop. It just treats
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
391 monotonic operations like a read+write to a memory location, and anything
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
392 stricter than that like a nothrow call.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
393
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
394 * DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores can
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
395 be DSE'ed in some cases, but it's tricky to reason about, and not especially
|
83
|
396 important. It is possible in some case for DSE to operate across a stronger
|
|
397 atomic operation, but it is fairly tricky. DSE delegates this reasoning to
|
|
398 MemoryDependencyAnalysis (which is also used by other passes like GVN).
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
399
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
400 * Folding a load: Any atomic load from a constant global can be constant-folded,
|
120
|
401 because it cannot be observed. Similar reasoning allows sroa with
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
402 atomic loads and stores.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
403
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
404 Atomics and Codegen
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
405 ===================
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
406
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
407 Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
408 On architectures which use barrier instructions for all atomic ordering (like
|
83
|
409 ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
|
|
410 ``setInsertFencesForAtomic()`` was used.
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
411
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
412 The MachineMemOperand for all atomic operations is currently marked as volatile;
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
413 this is not correct in the IR sense of volatile, but CodeGen handles anything
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
414 marked volatile very conservatively. This should get fixed at some point.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
415
|
120
|
416 One very important property of the atomic operations is that if your backend
|
|
417 supports any inline lock-free atomic operations of a given size, you should
|
|
418 support *ALL* operations of that size in a lock-free manner.
|
|
419
|
|
420 When the target implements atomic ``cmpxchg`` or LL/SC instructions (as most do)
|
|
421 this is trivial: all the other operations can be implemented on top of those
|
|
422 primitives. However, on many older CPUs (e.g. ARMv5, SparcV8, Intel 80386) there
|
|
423 are atomic load and store instructions, but no ``cmpxchg`` or LL/SC. As it is
|
|
424 invalid to implement ``atomic load`` using the native instruction, but
|
|
425 ``cmpxchg`` using a library call to a function that uses a mutex, ``atomic
|
|
426 load`` must *also* expand to a library call on such architectures, so that it
|
|
427 can remain atomic with regards to a simultaneous ``cmpxchg``, by using the same
|
|
428 mutex.
|
|
429
|
|
430 AtomicExpandPass can help with that: it will expand all atomic operations to the
|
|
431 proper ``__atomic_*`` libcalls for any size above the maximum set by
|
|
432 ``setMaxAtomicSizeInBitsSupported`` (which defaults to 0).
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
433
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
434 On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
435 generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
436 fences generate an ``MFENCE``, other fences do not cause any code to be
|
120
|
437 generated. ``cmpxchg`` uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg``
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
438 uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
439 other ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``. Depending
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
440 on the users of the result, some ``atomicrmw`` operations can be translated into
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
441 operations like ``LOCK AND``, but that does not work in general.
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
442
|
77
|
443 On ARM (before v8), MIPS, and many other RISC architectures, Acquire, Release,
|
|
444 and SequentiallyConsistent semantics require barrier instructions for every such
|
0
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
445 operation. Loads and stores generate normal instructions. ``cmpxchg`` and
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
446 ``atomicrmw`` can be represented using a loop with LL/SC-style instructions
|
Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
447 which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
|
77
|
448 on ARM, etc.).
|
83
|
449
|
|
450 It is often easiest for backends to use AtomicExpandPass to lower some of the
|
|
451 atomic constructs. Here are some lowerings it can do:
|
|
452
|
|
453 * cmpxchg -> loop with load-linked/store-conditional
|
95
|
454 by overriding ``shouldExpandAtomicCmpXchgInIR()``, ``emitLoadLinked()``,
|
83
|
455 ``emitStoreConditional()``
|
|
456 * large loads/stores -> ll-sc/cmpxchg
|
|
457 by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
|
120
|
458 * strong atomic accesses -> monotonic accesses + fences by overriding
|
|
459 ``shouldInsertFencesForAtomic()``, ``emitLeadingFence()``, and
|
|
460 ``emitTrailingFence()``
|
83
|
461 * atomic rmw -> loop with cmpxchg or load-linked/store-conditional
|
|
462 by overriding ``expandAtomicRMWInIR()``
|
120
|
463 * expansion to __atomic_* libcalls for unsupported sizes.
|
83
|
464
|
|
465 For an example of all of these, look at the ARM backend.
|
120
|
466
|
|
467 Libcalls: __atomic_*
|
|
468 ====================
|
|
469
|
|
470 There are two kinds of atomic library calls that are generated by LLVM. Please
|
|
471 note that both sets of library functions somewhat confusingly share the names of
|
|
472 builtin functions defined by clang. Despite this, the library functions are
|
|
473 not directly related to the builtins: it is *not* the case that ``__atomic_*``
|
|
474 builtins lower to ``__atomic_*`` library calls and ``__sync_*`` builtins lower
|
|
475 to ``__sync_*`` library calls.
|
|
476
|
|
477 The first set of library functions are named ``__atomic_*``. This set has been
|
|
478 "standardized" by GCC, and is described below. (See also `GCC's documentation
|
|
479 <https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary>`_)
|
|
480
|
|
481 LLVM's AtomicExpandPass will translate atomic operations on data sizes above
|
|
482 ``MaxAtomicSizeInBitsSupported`` into calls to these functions.
|
|
483
|
|
484 There are four generic functions, which can be called with data of any size or
|
|
485 alignment::
|
|
486
|
|
487 void __atomic_load(size_t size, void *ptr, void *ret, int ordering)
|
|
488 void __atomic_store(size_t size, void *ptr, void *val, int ordering)
|
|
489 void __atomic_exchange(size_t size, void *ptr, void *val, void *ret, int ordering)
|
|
490 bool __atomic_compare_exchange(size_t size, void *ptr, void *expected, void *desired, int success_order, int failure_order)
|
|
491
|
|
492 There are also size-specialized versions of the above functions, which can only
|
|
493 be used with *naturally-aligned* pointers of the appropriate size. In the
|
|
494 signatures below, "N" is one of 1, 2, 4, 8, and 16, and "iN" is the appropriate
|
|
495 integer type of that size; if no such integer type exists, the specialization
|
|
496 cannot be used::
|
|
497
|
|
498 iN __atomic_load_N(iN *ptr, iN val, int ordering)
|
|
499 void __atomic_store_N(iN *ptr, iN val, int ordering)
|
|
500 iN __atomic_exchange_N(iN *ptr, iN val, int ordering)
|
|
501 bool __atomic_compare_exchange_N(iN *ptr, iN *expected, iN desired, int success_order, int failure_order)
|
|
502
|
|
503 Finally there are some read-modify-write functions, which are only available in
|
|
504 the size-specific variants (any other sizes use a ``__atomic_compare_exchange``
|
|
505 loop)::
|
|
506
|
|
507 iN __atomic_fetch_add_N(iN *ptr, iN val, int ordering)
|
|
508 iN __atomic_fetch_sub_N(iN *ptr, iN val, int ordering)
|
|
509 iN __atomic_fetch_and_N(iN *ptr, iN val, int ordering)
|
|
510 iN __atomic_fetch_or_N(iN *ptr, iN val, int ordering)
|
|
511 iN __atomic_fetch_xor_N(iN *ptr, iN val, int ordering)
|
|
512 iN __atomic_fetch_nand_N(iN *ptr, iN val, int ordering)
|
|
513
|
|
514 This set of library functions have some interesting implementation requirements
|
|
515 to take note of:
|
|
516
|
|
517 - They support all sizes and alignments -- including those which cannot be
|
|
518 implemented natively on any existing hardware. Therefore, they will certainly
|
|
519 use mutexes in for some sizes/alignments.
|
|
520
|
|
521 - As a consequence, they cannot be shipped in a statically linked
|
|
522 compiler-support library, as they have state which must be shared amongst all
|
|
523 DSOs loaded in the program. They must be provided in a shared library used by
|
|
524 all objects.
|
|
525
|
|
526 - The set of atomic sizes supported lock-free must be a superset of the sizes
|
|
527 any compiler can emit. That is: if a new compiler introduces support for
|
|
528 inline-lock-free atomics of size N, the ``__atomic_*`` functions must also have a
|
|
529 lock-free implementation for size N. This is a requirement so that code
|
|
530 produced by an old compiler (which will have called the ``__atomic_*`` function)
|
|
531 interoperates with code produced by the new compiler (which will use native
|
|
532 the atomic instruction).
|
|
533
|
|
534 Note that it's possible to write an entirely target-independent implementation
|
|
535 of these library functions by using the compiler atomic builtins themselves to
|
|
536 implement the operations on naturally-aligned pointers of supported sizes, and a
|
|
537 generic mutex implementation otherwise.
|
|
538
|
|
539 Libcalls: __sync_*
|
|
540 ==================
|
|
541
|
|
542 Some targets or OS/target combinations can support lock-free atomics, but for
|
|
543 various reasons, it is not practical to emit the instructions inline.
|
|
544
|
|
545 There's two typical examples of this.
|
|
546
|
|
547 Some CPUs support multiple instruction sets which can be swiched back and forth
|
|
548 on function-call boundaries. For example, MIPS supports the MIPS16 ISA, which
|
|
549 has a smaller instruction encoding than the usual MIPS32 ISA. ARM, similarly,
|
|
550 has the Thumb ISA. In MIPS16 and earlier versions of Thumb, the atomic
|
|
551 instructions are not encodable. However, those instructions are available via a
|
|
552 function call to a function with the longer encoding.
|
|
553
|
|
554 Additionally, a few OS/target pairs provide kernel-supported lock-free
|
|
555 atomics. ARM/Linux is an example of this: the kernel `provides
|
|
556 <https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt>`_ a
|
|
557 function which on older CPUs contains a "magically-restartable" atomic sequence
|
|
558 (which looks atomic so long as there's only one CPU), and contains actual atomic
|
|
559 instructions on newer multicore models. This sort of functionality can typically
|
|
560 be provided on any architecture, if all CPUs which are missing atomic
|
|
561 compare-and-swap support are uniprocessor (no SMP). This is almost always the
|
|
562 case. The only common architecture without that property is SPARC -- SPARCV8 SMP
|
|
563 systems were common, yet it doesn't support any sort of compare-and-swap
|
|
564 operation.
|
|
565
|
|
566 In either of these cases, the Target in LLVM can claim support for atomics of an
|
|
567 appropriate size, and then implement some subset of the operations via libcalls
|
|
568 to a ``__sync_*`` function. Such functions *must* not use locks in their
|
|
569 implementation, because unlike the ``__atomic_*`` routines used by
|
|
570 AtomicExpandPass, these may be mixed-and-matched with native instructions by the
|
|
571 target lowering.
|
|
572
|
|
573 Further, these routines do not need to be shared, as they are stateless. So,
|
|
574 there is no issue with having multiple copies included in one binary. Thus,
|
|
575 typically these routines are implemented by the statically-linked compiler
|
|
576 runtime support library.
|
|
577
|
|
578 LLVM will emit a call to an appropriate ``__sync_*`` routine if the target
|
|
579 ISelLowering code has set the corresponding ``ATOMIC_CMPXCHG``, ``ATOMIC_SWAP``,
|
|
580 or ``ATOMIC_LOAD_*`` operation to "Expand", and if it has opted-into the
|
|
581 availability of those library functions via a call to ``initSyncLibcalls()``.
|
|
582
|
|
583 The full set of functions that may be called by LLVM is (for ``N`` being 1, 2,
|
|
584 4, 8, or 16)::
|
|
585
|
|
586 iN __sync_val_compare_and_swap_N(iN *ptr, iN expected, iN desired)
|
|
587 iN __sync_lock_test_and_set_N(iN *ptr, iN val)
|
|
588 iN __sync_fetch_and_add_N(iN *ptr, iN val)
|
|
589 iN __sync_fetch_and_sub_N(iN *ptr, iN val)
|
|
590 iN __sync_fetch_and_and_N(iN *ptr, iN val)
|
|
591 iN __sync_fetch_and_or_N(iN *ptr, iN val)
|
|
592 iN __sync_fetch_and_xor_N(iN *ptr, iN val)
|
|
593 iN __sync_fetch_and_nand_N(iN *ptr, iN val)
|
|
594 iN __sync_fetch_and_max_N(iN *ptr, iN val)
|
|
595 iN __sync_fetch_and_umax_N(iN *ptr, iN val)
|
|
596 iN __sync_fetch_and_min_N(iN *ptr, iN val)
|
|
597 iN __sync_fetch_and_umin_N(iN *ptr, iN val)
|
|
598
|
|
599 This list doesn't include any function for atomic load or store; all known
|
|
600 architectures support atomic loads and stores directly (possibly by emitting a
|
|
601 fence on either side of a normal load or store.)
|
|
602
|
|
603 There's also, somewhat separately, the possibility to lower ``ATOMIC_FENCE`` to
|
|
604 ``__sync_synchronize()``. This may happen or not happen independent of all the
|
|
605 above, controlled purely by ``setOperationAction(ISD::ATOMIC_FENCE, ...)``.
|