CbC/CbC_llvm: docs/Atomics.rst comparison

comparison docs/Atomics.rst @ 83:60c9769439b8 LLVM3.7

LLVM 3.7

author	Tatsuki IHA <e125716@ie.u-ryukyu.ac.jp>
date	Wed, 18 Feb 2015 14:55:36 +0900
parents	54457678186b
children	afa8332a0e37

comparison

equal deleted inserted replaced

-:af83660cff7b
+:60c9769439b8
 clarified in the IR.
 The atomic instructions are designed specifically to provide readable IR and
 optimized code generation for the following:
-* The new C++0x ``<atomic>`` header.  (`C++0x draft available here
+* The new C++11 ``<atomic>`` header.  (`C++11 draft available here
-<http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C1x draft available here
+<http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
 <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
 * Proper semantics for Java-style memory, for both ``volatile`` and regular
 shared variables. (`Java Specification
 <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
 memory operation can happen on any thread between the load and store.
 A ``fence`` provides Acquire and/or Release ordering which is not part of
 another operation; it is normally used along with Monotonic memory operations.
 A Monotonic load followed by an Acquire fence is roughly equivalent to an
-Acquire load.
+Acquire load, and a Monotonic store following a Release fence is roughly
+equivalent to a Release store. SequentiallyConsistent fences behave as both
+an Acquire and a Release fence, and offer some additional complicated
+guarantees, see the C++11 standard for details.
 Frontends generating atomic instructions generally need to be aware of the
 target to some degree; atomic instructions are guaranteed to be lock-free, and
 therefore an instruction which is wider than the target natively supports can be
 impossible to generate.
 Unordered
 ---------
 Unordered is the lowest level of atomicity. It essentially guarantees that races
 produce somewhat sane results instead of having undefined behavior.  It also
-guarantees the operation to be lock-free, so it do not depend on the data being
+guarantees the operation to be lock-free, so it does not depend on the data
-part of a special atomic structure or depend on a separate per-process global
+being part of a special atomic structure or depend on a separate per-process
-lock.  Note that code generation will fail for unsupported atomic operations; if
+global lock.  Note that code generation will fail for unsupported atomic
-you need such an operation, use explicit locking.
+operations; if you need such an operation, use explicit locking.
 Relevant standard
 This is intended to match the Java memory model for shared variables.
 Notes for frontends
 primitives, although it does not provide any general synchronization. It
 essentially guarantees that if you take all the operations affecting a specific
 address, a consistent ordering exists.
 Relevant standard
-This corresponds to the C++0x/C1x ``memory_order_relaxed``; see those
+This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
 standards for the exact definition.
 Notes for frontends
 If you are writing a frontend which uses this directly, use with caution.  The
 guarantees in terms of synchronization are very weak, so make sure these are
 Acquire provides a barrier of the sort necessary to acquire a lock to access
 other memory with normal loads and stores.
 Relevant standard
-This corresponds to the C++0x/C1x ``memory_order_acquire``. It should also be
+This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
-used for C++0x/C1x ``memory_order_consume``.
+used for C++11/C11 ``memory_order_consume``.
 Notes for frontends
 If you are writing a frontend which uses this directly, use with caution.
 Acquire only provides a semantic guarantee when paired with a Release
 operation.
 Release is similar to Acquire, but with a barrier of the sort necessary to
 release a lock.
 Relevant standard
-This corresponds to the C++0x/C1x ``memory_order_release``.
+This corresponds to the C++11/C11 ``memory_order_release``.
 Notes for frontends
 If you are writing a frontend which uses this directly, use with caution.
 Release only provides a semantic guarantee when paired with a Acquire
 operation.
 AcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
 barrier (for fences and operations which both read and write memory).
 Relevant standard
-This corresponds to the C++0x/C1x ``memory_order_acq_rel``.
+This corresponds to the C++11/C11 ``memory_order_acq_rel``.
 Notes for frontends
 If you are writing a frontend which uses this directly, use with caution.
 Acquire only provides a semantic guarantee when paired with a Release
 operation, and vice versa.
 SequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads
 and Release semantics for stores. Additionally, it guarantees that a total
 ordering exists between all SequentiallyConsistent operations.
 Relevant standard
-This corresponds to the C++0x/C1x ``memory_order_seq_cst``, Java volatile, and
+This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
 the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
 Notes for frontends
 If a frontend is exposing atomic operations, these are much easier to reason
 about for the programmer than other kinds of operations, and using them is
 * ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note
 that they return true for any operation which is volatile or at least
 Monotonic.
+* ``isAtLeastAcquire()``/``isAtLeastRelease()``: These are predicates on
+orderings. They can be useful for passes that are aware of atomics, for
+example to do DSE across a single atomic access, but not across a
+release-acquire pair (see MemoryDependencyAnalysis for an example of this)
 * Alias analysis: Note that AA will return ModRef for anything Acquire or
 Release, and for the address accessed by any Monotonic operation.
 To support optimizing around atomic operations, make sure you are using the
 right predicates; everything should work if that is done.  If your pass should
 monotonic operations like a read+write to a memory location, and anything
 stricter than that like a nothrow call.
 * DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores can
 be DSE'ed in some cases, but it's tricky to reason about, and not especially
-important.
+important. It is possible in some case for DSE to operate across a stronger
+atomic operation, but it is fairly tricky. DSE delegates this reasoning to
+MemoryDependencyAnalysis (which is also used by other passes like GVN).
 * Folding a load: Any atomic load from a constant global can be constant-folded,
 because it cannot be observed.  Similar reasoning allows scalarrepl with
 atomic loads and stores.
 Atomics and Codegen
 ===================
 Atomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
 On architectures which use barrier instructions for all atomic ordering (like
-ARM), appropriate fences are split out as the DAG is built.
+ARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
+``setInsertFencesForAtomic()`` was used.
 The MachineMemOperand for all atomic operations is currently marked as volatile;
 this is not correct in the IR sense of volatile, but CodeGen handles anything
 marked volatile very conservatively.  This should get fixed at some point.
 implemented in a lock-free manner.  It is expected that backends will give an
 error when given an operation which cannot be implemented.  (The LLVM code
 generator is not very helpful here at the moment, but hopefully that will
 change.)
-The implementation of atomics on LL/SC architectures (like ARM) is currently a
-bit of a mess; there is a lot of copy-pasted code across targets, and the
-representation is relatively unsuited to optimization (it would be nice to be
-able to optimize loops involving cmpxchg etc.).
 On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
 generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
 fences generate an ``MFENCE``, other fences do not cause any code to be
 generated.  cmpxchg uses the ``LOCK CMPXCHG`` instruction.  ``atomicrmw xchg``
 uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all
 and SequentiallyConsistent semantics require barrier instructions for every such
 operation. Loads and stores generate normal instructions.  ``cmpxchg`` and
 ``atomicrmw`` can be represented using a loop with LL/SC-style instructions
 which take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
 on ARM, etc.).
+It is often easiest for backends to use AtomicExpandPass to lower some of the
+atomic constructs. Here are some lowerings it can do:
+* cmpxchg -> loop with load-linked/store-conditional
+by overriding ``hasLoadLinkedStoreConditional()``, ``emitLoadLinked()``,
+``emitStoreConditional()``
+* large loads/stores -> ll-sc/cmpxchg
+by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
+* strong atomic accesses -> monotonic accesses + fences
+by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()``
+and ``emitTrailingFence()``
+* atomic rmw -> loop with cmpxchg or load-linked/store-conditional
+by overriding ``expandAtomicRMWInIR()``
+For an example of all of these, look at the ARM backend.

Mercurial > hg > CbC > CbC_llvm

comparison docs/Atomics.rst @ 83:60c9769439b8 LLVM3.7