Mercurial > hg > CbC > CbC_llvm
comparison docs/Atomics.rst @ 83:60c9769439b8 LLVM3.7
LLVM 3.7
author | Tatsuki IHA <e125716@ie.u-ryukyu.ac.jp> |
---|---|
date | Wed, 18 Feb 2015 14:55:36 +0900 |
parents | 54457678186b |
children | afa8332a0e37 |
comparison
equal
deleted
inserted
replaced
78:af83660cff7b | 83:60c9769439b8 |
---|---|
16 clarified in the IR. | 16 clarified in the IR. |
17 | 17 |
18 The atomic instructions are designed specifically to provide readable IR and | 18 The atomic instructions are designed specifically to provide readable IR and |
19 optimized code generation for the following: | 19 optimized code generation for the following: |
20 | 20 |
21 * The new C++0x ``<atomic>`` header. (`C++0x draft available here | 21 * The new C++11 ``<atomic>`` header. (`C++11 draft available here |
22 <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C1x draft available here | 22 <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here |
23 <http://www.open-std.org/jtc1/sc22/wg14/>`_.) | 23 <http://www.open-std.org/jtc1/sc22/wg14/>`_.) |
24 | 24 |
25 * Proper semantics for Java-style memory, for both ``volatile`` and regular | 25 * Proper semantics for Java-style memory, for both ``volatile`` and regular |
26 shared variables. (`Java Specification | 26 shared variables. (`Java Specification |
27 <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_) | 27 <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_) |
113 memory operation can happen on any thread between the load and store. | 113 memory operation can happen on any thread between the load and store. |
114 | 114 |
115 A ``fence`` provides Acquire and/or Release ordering which is not part of | 115 A ``fence`` provides Acquire and/or Release ordering which is not part of |
116 another operation; it is normally used along with Monotonic memory operations. | 116 another operation; it is normally used along with Monotonic memory operations. |
117 A Monotonic load followed by an Acquire fence is roughly equivalent to an | 117 A Monotonic load followed by an Acquire fence is roughly equivalent to an |
118 Acquire load. | 118 Acquire load, and a Monotonic store following a Release fence is roughly |
119 equivalent to a Release store. SequentiallyConsistent fences behave as both | |
120 an Acquire and a Release fence, and offer some additional complicated | |
121 guarantees, see the C++11 standard for details. | |
119 | 122 |
120 Frontends generating atomic instructions generally need to be aware of the | 123 Frontends generating atomic instructions generally need to be aware of the |
121 target to some degree; atomic instructions are guaranteed to be lock-free, and | 124 target to some degree; atomic instructions are guaranteed to be lock-free, and |
122 therefore an instruction which is wider than the target natively supports can be | 125 therefore an instruction which is wider than the target natively supports can be |
123 impossible to generate. | 126 impossible to generate. |
175 Unordered | 178 Unordered |
176 --------- | 179 --------- |
177 | 180 |
178 Unordered is the lowest level of atomicity. It essentially guarantees that races | 181 Unordered is the lowest level of atomicity. It essentially guarantees that races |
179 produce somewhat sane results instead of having undefined behavior. It also | 182 produce somewhat sane results instead of having undefined behavior. It also |
180 guarantees the operation to be lock-free, so it do not depend on the data being | 183 guarantees the operation to be lock-free, so it does not depend on the data |
181 part of a special atomic structure or depend on a separate per-process global | 184 being part of a special atomic structure or depend on a separate per-process |
182 lock. Note that code generation will fail for unsupported atomic operations; if | 185 global lock. Note that code generation will fail for unsupported atomic |
183 you need such an operation, use explicit locking. | 186 operations; if you need such an operation, use explicit locking. |
184 | 187 |
185 Relevant standard | 188 Relevant standard |
186 This is intended to match the Java memory model for shared variables. | 189 This is intended to match the Java memory model for shared variables. |
187 | 190 |
188 Notes for frontends | 191 Notes for frontends |
219 primitives, although it does not provide any general synchronization. It | 222 primitives, although it does not provide any general synchronization. It |
220 essentially guarantees that if you take all the operations affecting a specific | 223 essentially guarantees that if you take all the operations affecting a specific |
221 address, a consistent ordering exists. | 224 address, a consistent ordering exists. |
222 | 225 |
223 Relevant standard | 226 Relevant standard |
224 This corresponds to the C++0x/C1x ``memory_order_relaxed``; see those | 227 This corresponds to the C++11/C11 ``memory_order_relaxed``; see those |
225 standards for the exact definition. | 228 standards for the exact definition. |
226 | 229 |
227 Notes for frontends | 230 Notes for frontends |
228 If you are writing a frontend which uses this directly, use with caution. The | 231 If you are writing a frontend which uses this directly, use with caution. The |
229 guarantees in terms of synchronization are very weak, so make sure these are | 232 guarantees in terms of synchronization are very weak, so make sure these are |
249 | 252 |
250 Acquire provides a barrier of the sort necessary to acquire a lock to access | 253 Acquire provides a barrier of the sort necessary to acquire a lock to access |
251 other memory with normal loads and stores. | 254 other memory with normal loads and stores. |
252 | 255 |
253 Relevant standard | 256 Relevant standard |
254 This corresponds to the C++0x/C1x ``memory_order_acquire``. It should also be | 257 This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be |
255 used for C++0x/C1x ``memory_order_consume``. | 258 used for C++11/C11 ``memory_order_consume``. |
256 | 259 |
257 Notes for frontends | 260 Notes for frontends |
258 If you are writing a frontend which uses this directly, use with caution. | 261 If you are writing a frontend which uses this directly, use with caution. |
259 Acquire only provides a semantic guarantee when paired with a Release | 262 Acquire only provides a semantic guarantee when paired with a Release |
260 operation. | 263 operation. |
279 | 282 |
280 Release is similar to Acquire, but with a barrier of the sort necessary to | 283 Release is similar to Acquire, but with a barrier of the sort necessary to |
281 release a lock. | 284 release a lock. |
282 | 285 |
283 Relevant standard | 286 Relevant standard |
284 This corresponds to the C++0x/C1x ``memory_order_release``. | 287 This corresponds to the C++11/C11 ``memory_order_release``. |
285 | 288 |
286 Notes for frontends | 289 Notes for frontends |
287 If you are writing a frontend which uses this directly, use with caution. | 290 If you are writing a frontend which uses this directly, use with caution. |
288 Release only provides a semantic guarantee when paired with a Acquire | 291 Release only provides a semantic guarantee when paired with a Acquire |
289 operation. | 292 operation. |
305 | 308 |
306 AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release | 309 AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release |
307 barrier (for fences and operations which both read and write memory). | 310 barrier (for fences and operations which both read and write memory). |
308 | 311 |
309 Relevant standard | 312 Relevant standard |
310 This corresponds to the C++0x/C1x ``memory_order_acq_rel``. | 313 This corresponds to the C++11/C11 ``memory_order_acq_rel``. |
311 | 314 |
312 Notes for frontends | 315 Notes for frontends |
313 If you are writing a frontend which uses this directly, use with caution. | 316 If you are writing a frontend which uses this directly, use with caution. |
314 Acquire only provides a semantic guarantee when paired with a Release | 317 Acquire only provides a semantic guarantee when paired with a Release |
315 operation, and vice versa. | 318 operation, and vice versa. |
328 SequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads | 331 SequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads |
329 and Release semantics for stores. Additionally, it guarantees that a total | 332 and Release semantics for stores. Additionally, it guarantees that a total |
330 ordering exists between all SequentiallyConsistent operations. | 333 ordering exists between all SequentiallyConsistent operations. |
331 | 334 |
332 Relevant standard | 335 Relevant standard |
333 This corresponds to the C++0x/C1x ``memory_order_seq_cst``, Java volatile, and | 336 This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and |
334 the gcc-compatible ``__sync_*`` builtins which do not specify otherwise. | 337 the gcc-compatible ``__sync_*`` builtins which do not specify otherwise. |
335 | 338 |
336 Notes for frontends | 339 Notes for frontends |
337 If a frontend is exposing atomic operations, these are much easier to reason | 340 If a frontend is exposing atomic operations, these are much easier to reason |
338 about for the programmer than other kinds of operations, and using them is | 341 about for the programmer than other kinds of operations, and using them is |
366 | 369 |
367 * ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note | 370 * ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note |
368 that they return true for any operation which is volatile or at least | 371 that they return true for any operation which is volatile or at least |
369 Monotonic. | 372 Monotonic. |
370 | 373 |
374 * ``isAtLeastAcquire()``/``isAtLeastRelease()``: These are predicates on | |
375 orderings. They can be useful for passes that are aware of atomics, for | |
376 example to do DSE across a single atomic access, but not across a | |
377 release-acquire pair (see MemoryDependencyAnalysis for an example of this) | |
378 | |
371 * Alias analysis: Note that AA will return ModRef for anything Acquire or | 379 * Alias analysis: Note that AA will return ModRef for anything Acquire or |
372 Release, and for the address accessed by any Monotonic operation. | 380 Release, and for the address accessed by any Monotonic operation. |
373 | 381 |
374 To support optimizing around atomic operations, make sure you are using the | 382 To support optimizing around atomic operations, make sure you are using the |
375 right predicates; everything should work if that is done. If your pass should | 383 right predicates; everything should work if that is done. If your pass should |
387 monotonic operations like a read+write to a memory location, and anything | 395 monotonic operations like a read+write to a memory location, and anything |
388 stricter than that like a nothrow call. | 396 stricter than that like a nothrow call. |
389 | 397 |
390 * DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores can | 398 * DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores can |
391 be DSE'ed in some cases, but it's tricky to reason about, and not especially | 399 be DSE'ed in some cases, but it's tricky to reason about, and not especially |
392 important. | 400 important. It is possible in some case for DSE to operate across a stronger |
401 atomic operation, but it is fairly tricky. DSE delegates this reasoning to | |
402 MemoryDependencyAnalysis (which is also used by other passes like GVN). | |
393 | 403 |
394 * Folding a load: Any atomic load from a constant global can be constant-folded, | 404 * Folding a load: Any atomic load from a constant global can be constant-folded, |
395 because it cannot be observed. Similar reasoning allows scalarrepl with | 405 because it cannot be observed. Similar reasoning allows scalarrepl with |
396 atomic loads and stores. | 406 atomic loads and stores. |
397 | 407 |
398 Atomics and Codegen | 408 Atomics and Codegen |
399 =================== | 409 =================== |
400 | 410 |
401 Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes. | 411 Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes. |
402 On architectures which use barrier instructions for all atomic ordering (like | 412 On architectures which use barrier instructions for all atomic ordering (like |
403 ARM), appropriate fences are split out as the DAG is built. | 413 ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if |
414 ``setInsertFencesForAtomic()`` was used. | |
404 | 415 |
405 The MachineMemOperand for all atomic operations is currently marked as volatile; | 416 The MachineMemOperand for all atomic operations is currently marked as volatile; |
406 this is not correct in the IR sense of volatile, but CodeGen handles anything | 417 this is not correct in the IR sense of volatile, but CodeGen handles anything |
407 marked volatile very conservatively. This should get fixed at some point. | 418 marked volatile very conservatively. This should get fixed at some point. |
408 | 419 |
413 implemented in a lock-free manner. It is expected that backends will give an | 424 implemented in a lock-free manner. It is expected that backends will give an |
414 error when given an operation which cannot be implemented. (The LLVM code | 425 error when given an operation which cannot be implemented. (The LLVM code |
415 generator is not very helpful here at the moment, but hopefully that will | 426 generator is not very helpful here at the moment, but hopefully that will |
416 change.) | 427 change.) |
417 | 428 |
418 The implementation of atomics on LL/SC architectures (like ARM) is currently a | |
419 bit of a mess; there is a lot of copy-pasted code across targets, and the | |
420 representation is relatively unsuited to optimization (it would be nice to be | |
421 able to optimize loops involving cmpxchg etc.). | |
422 | |
423 On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores | 429 On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores |
424 generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent | 430 generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent |
425 fences generate an ``MFENCE``, other fences do not cause any code to be | 431 fences generate an ``MFENCE``, other fences do not cause any code to be |
426 generated. cmpxchg uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg`` | 432 generated. cmpxchg uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg`` |
427 uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all | 433 uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all |
433 and SequentiallyConsistent semantics require barrier instructions for every such | 439 and SequentiallyConsistent semantics require barrier instructions for every such |
434 operation. Loads and stores generate normal instructions. ``cmpxchg`` and | 440 operation. Loads and stores generate normal instructions. ``cmpxchg`` and |
435 ``atomicrmw`` can be represented using a loop with LL/SC-style instructions | 441 ``atomicrmw`` can be represented using a loop with LL/SC-style instructions |
436 which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX`` | 442 which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX`` |
437 on ARM, etc.). | 443 on ARM, etc.). |
444 | |
445 It is often easiest for backends to use AtomicExpandPass to lower some of the | |
446 atomic constructs. Here are some lowerings it can do: | |
447 | |
448 * cmpxchg -> loop with load-linked/store-conditional | |
449 by overriding ``hasLoadLinkedStoreConditional()``, ``emitLoadLinked()``, | |
450 ``emitStoreConditional()`` | |
451 * large loads/stores -> ll-sc/cmpxchg | |
452 by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()`` | |
453 * strong atomic accesses -> monotonic accesses + fences | |
454 by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()`` | |
455 and ``emitTrailingFence()`` | |
456 * atomic rmw -> loop with cmpxchg or load-linked/store-conditional | |
457 by overriding ``expandAtomicRMWInIR()`` | |
458 | |
459 For an example of all of these, look at the ARM backend. |