comparison docs/Atomics.rst @ 83:60c9769439b8 LLVM3.7

LLVM 3.7
author Tatsuki IHA <e125716@ie.u-ryukyu.ac.jp>
date Wed, 18 Feb 2015 14:55:36 +0900
parents 54457678186b
children afa8332a0e37
comparison
equal deleted inserted replaced
78:af83660cff7b 83:60c9769439b8
16 clarified in the IR. 16 clarified in the IR.
17 17
18 The atomic instructions are designed specifically to provide readable IR and 18 The atomic instructions are designed specifically to provide readable IR and
19 optimized code generation for the following: 19 optimized code generation for the following:
20 20
21 * The new C++0x ``<atomic>`` header. (`C++0x draft available here 21 * The new C++11 ``<atomic>`` header. (`C++11 draft available here
22 <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C1x draft available here 22 <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
23 <http://www.open-std.org/jtc1/sc22/wg14/>`_.) 23 <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
24 24
25 * Proper semantics for Java-style memory, for both ``volatile`` and regular 25 * Proper semantics for Java-style memory, for both ``volatile`` and regular
26 shared variables. (`Java Specification 26 shared variables. (`Java Specification
27 <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_) 27 <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
113 memory operation can happen on any thread between the load and store. 113 memory operation can happen on any thread between the load and store.
114 114
115 A ``fence`` provides Acquire and/or Release ordering which is not part of 115 A ``fence`` provides Acquire and/or Release ordering which is not part of
116 another operation; it is normally used along with Monotonic memory operations. 116 another operation; it is normally used along with Monotonic memory operations.
117 A Monotonic load followed by an Acquire fence is roughly equivalent to an 117 A Monotonic load followed by an Acquire fence is roughly equivalent to an
118 Acquire load. 118 Acquire load, and a Monotonic store following a Release fence is roughly
119 equivalent to a Release store. SequentiallyConsistent fences behave as both
120 an Acquire and a Release fence, and offer some additional complicated
121 guarantees, see the C++11 standard for details.
119 122
120 Frontends generating atomic instructions generally need to be aware of the 123 Frontends generating atomic instructions generally need to be aware of the
121 target to some degree; atomic instructions are guaranteed to be lock-free, and 124 target to some degree; atomic instructions are guaranteed to be lock-free, and
122 therefore an instruction which is wider than the target natively supports can be 125 therefore an instruction which is wider than the target natively supports can be
123 impossible to generate. 126 impossible to generate.
175 Unordered 178 Unordered
176 --------- 179 ---------
177 180
178 Unordered is the lowest level of atomicity. It essentially guarantees that races 181 Unordered is the lowest level of atomicity. It essentially guarantees that races
179 produce somewhat sane results instead of having undefined behavior. It also 182 produce somewhat sane results instead of having undefined behavior. It also
180 guarantees the operation to be lock-free, so it do not depend on the data being 183 guarantees the operation to be lock-free, so it does not depend on the data
181 part of a special atomic structure or depend on a separate per-process global 184 being part of a special atomic structure or depend on a separate per-process
182 lock. Note that code generation will fail for unsupported atomic operations; if 185 global lock. Note that code generation will fail for unsupported atomic
183 you need such an operation, use explicit locking. 186 operations; if you need such an operation, use explicit locking.
184 187
185 Relevant standard 188 Relevant standard
186 This is intended to match the Java memory model for shared variables. 189 This is intended to match the Java memory model for shared variables.
187 190
188 Notes for frontends 191 Notes for frontends
219 primitives, although it does not provide any general synchronization. It 222 primitives, although it does not provide any general synchronization. It
220 essentially guarantees that if you take all the operations affecting a specific 223 essentially guarantees that if you take all the operations affecting a specific
221 address, a consistent ordering exists. 224 address, a consistent ordering exists.
222 225
223 Relevant standard 226 Relevant standard
224 This corresponds to the C++0x/C1x ``memory_order_relaxed``; see those 227 This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
225 standards for the exact definition. 228 standards for the exact definition.
226 229
227 Notes for frontends 230 Notes for frontends
228 If you are writing a frontend which uses this directly, use with caution. The 231 If you are writing a frontend which uses this directly, use with caution. The
229 guarantees in terms of synchronization are very weak, so make sure these are 232 guarantees in terms of synchronization are very weak, so make sure these are
249 252
250 Acquire provides a barrier of the sort necessary to acquire a lock to access 253 Acquire provides a barrier of the sort necessary to acquire a lock to access
251 other memory with normal loads and stores. 254 other memory with normal loads and stores.
252 255
253 Relevant standard 256 Relevant standard
254 This corresponds to the C++0x/C1x ``memory_order_acquire``. It should also be 257 This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
255 used for C++0x/C1x ``memory_order_consume``. 258 used for C++11/C11 ``memory_order_consume``.
256 259
257 Notes for frontends 260 Notes for frontends
258 If you are writing a frontend which uses this directly, use with caution. 261 If you are writing a frontend which uses this directly, use with caution.
259 Acquire only provides a semantic guarantee when paired with a Release 262 Acquire only provides a semantic guarantee when paired with a Release
260 operation. 263 operation.
279 282
280 Release is similar to Acquire, but with a barrier of the sort necessary to 283 Release is similar to Acquire, but with a barrier of the sort necessary to
281 release a lock. 284 release a lock.
282 285
283 Relevant standard 286 Relevant standard
284 This corresponds to the C++0x/C1x ``memory_order_release``. 287 This corresponds to the C++11/C11 ``memory_order_release``.
285 288
286 Notes for frontends 289 Notes for frontends
287 If you are writing a frontend which uses this directly, use with caution. 290 If you are writing a frontend which uses this directly, use with caution.
288 Release only provides a semantic guarantee when paired with a Acquire 291 Release only provides a semantic guarantee when paired with a Acquire
289 operation. 292 operation.
305 308
306 AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release 309 AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
307 barrier (for fences and operations which both read and write memory). 310 barrier (for fences and operations which both read and write memory).
308 311
309 Relevant standard 312 Relevant standard
310 This corresponds to the C++0x/C1x ``memory_order_acq_rel``. 313 This corresponds to the C++11/C11 ``memory_order_acq_rel``.
311 314
312 Notes for frontends 315 Notes for frontends
313 If you are writing a frontend which uses this directly, use with caution. 316 If you are writing a frontend which uses this directly, use with caution.
314 Acquire only provides a semantic guarantee when paired with a Release 317 Acquire only provides a semantic guarantee when paired with a Release
315 operation, and vice versa. 318 operation, and vice versa.
328 SequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads 331 SequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads
329 and Release semantics for stores. Additionally, it guarantees that a total 332 and Release semantics for stores. Additionally, it guarantees that a total
330 ordering exists between all SequentiallyConsistent operations. 333 ordering exists between all SequentiallyConsistent operations.
331 334
332 Relevant standard 335 Relevant standard
333 This corresponds to the C++0x/C1x ``memory_order_seq_cst``, Java volatile, and 336 This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
334 the gcc-compatible ``__sync_*`` builtins which do not specify otherwise. 337 the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
335 338
336 Notes for frontends 339 Notes for frontends
337 If a frontend is exposing atomic operations, these are much easier to reason 340 If a frontend is exposing atomic operations, these are much easier to reason
338 about for the programmer than other kinds of operations, and using them is 341 about for the programmer than other kinds of operations, and using them is
366 369
367 * ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note 370 * ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note
368 that they return true for any operation which is volatile or at least 371 that they return true for any operation which is volatile or at least
369 Monotonic. 372 Monotonic.
370 373
374 * ``isAtLeastAcquire()``/``isAtLeastRelease()``: These are predicates on
375 orderings. They can be useful for passes that are aware of atomics, for
376 example to do DSE across a single atomic access, but not across a
377 release-acquire pair (see MemoryDependencyAnalysis for an example of this)
378
371 * Alias analysis: Note that AA will return ModRef for anything Acquire or 379 * Alias analysis: Note that AA will return ModRef for anything Acquire or
372 Release, and for the address accessed by any Monotonic operation. 380 Release, and for the address accessed by any Monotonic operation.
373 381
374 To support optimizing around atomic operations, make sure you are using the 382 To support optimizing around atomic operations, make sure you are using the
375 right predicates; everything should work if that is done. If your pass should 383 right predicates; everything should work if that is done. If your pass should
387 monotonic operations like a read+write to a memory location, and anything 395 monotonic operations like a read+write to a memory location, and anything
388 stricter than that like a nothrow call. 396 stricter than that like a nothrow call.
389 397
390 * DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores can 398 * DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores can
391 be DSE'ed in some cases, but it's tricky to reason about, and not especially 399 be DSE'ed in some cases, but it's tricky to reason about, and not especially
392 important. 400 important. It is possible in some case for DSE to operate across a stronger
401 atomic operation, but it is fairly tricky. DSE delegates this reasoning to
402 MemoryDependencyAnalysis (which is also used by other passes like GVN).
393 403
394 * Folding a load: Any atomic load from a constant global can be constant-folded, 404 * Folding a load: Any atomic load from a constant global can be constant-folded,
395 because it cannot be observed. Similar reasoning allows scalarrepl with 405 because it cannot be observed. Similar reasoning allows scalarrepl with
396 atomic loads and stores. 406 atomic loads and stores.
397 407
398 Atomics and Codegen 408 Atomics and Codegen
399 =================== 409 ===================
400 410
401 Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes. 411 Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
402 On architectures which use barrier instructions for all atomic ordering (like 412 On architectures which use barrier instructions for all atomic ordering (like
403 ARM), appropriate fences are split out as the DAG is built. 413 ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
414 ``setInsertFencesForAtomic()`` was used.
404 415
405 The MachineMemOperand for all atomic operations is currently marked as volatile; 416 The MachineMemOperand for all atomic operations is currently marked as volatile;
406 this is not correct in the IR sense of volatile, but CodeGen handles anything 417 this is not correct in the IR sense of volatile, but CodeGen handles anything
407 marked volatile very conservatively. This should get fixed at some point. 418 marked volatile very conservatively. This should get fixed at some point.
408 419
413 implemented in a lock-free manner. It is expected that backends will give an 424 implemented in a lock-free manner. It is expected that backends will give an
414 error when given an operation which cannot be implemented. (The LLVM code 425 error when given an operation which cannot be implemented. (The LLVM code
415 generator is not very helpful here at the moment, but hopefully that will 426 generator is not very helpful here at the moment, but hopefully that will
416 change.) 427 change.)
417 428
418 The implementation of atomics on LL/SC architectures (like ARM) is currently a
419 bit of a mess; there is a lot of copy-pasted code across targets, and the
420 representation is relatively unsuited to optimization (it would be nice to be
421 able to optimize loops involving cmpxchg etc.).
422
423 On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores 429 On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
424 generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent 430 generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
425 fences generate an ``MFENCE``, other fences do not cause any code to be 431 fences generate an ``MFENCE``, other fences do not cause any code to be
426 generated. cmpxchg uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg`` 432 generated. cmpxchg uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg``
427 uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all 433 uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all
433 and SequentiallyConsistent semantics require barrier instructions for every such 439 and SequentiallyConsistent semantics require barrier instructions for every such
434 operation. Loads and stores generate normal instructions. ``cmpxchg`` and 440 operation. Loads and stores generate normal instructions. ``cmpxchg`` and
435 ``atomicrmw`` can be represented using a loop with LL/SC-style instructions 441 ``atomicrmw`` can be represented using a loop with LL/SC-style instructions
436 which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX`` 442 which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
437 on ARM, etc.). 443 on ARM, etc.).
444
445 It is often easiest for backends to use AtomicExpandPass to lower some of the
446 atomic constructs. Here are some lowerings it can do:
447
448 * cmpxchg -> loop with load-linked/store-conditional
449 by overriding ``hasLoadLinkedStoreConditional()``, ``emitLoadLinked()``,
450 ``emitStoreConditional()``
451 * large loads/stores -> ll-sc/cmpxchg
452 by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
453 * strong atomic accesses -> monotonic accesses + fences
454 by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()``
455 and ``emitTrailingFence()``
456 * atomic rmw -> loop with cmpxchg or load-linked/store-conditional
457 by overriding ``expandAtomicRMWInIR()``
458
459 For an example of all of these, look at the ARM backend.