CbC/CbC_llvm: mlir/docs/Rationale.md annotate

annotate mlir/docs/Rationale.md @ 164:fdfabb438fbf

...

author	anatofuz
date	Thu, 19 Mar 2020 17:02:53 +0900
parents	1d019706d866
children

rev	line source
150 1d019706d866 LLVM10 anatofuz parents: diff changeset	1 # MLIR Rationale
1d019706d866 LLVM10 anatofuz parents: diff changeset	2
1d019706d866 LLVM10 anatofuz parents: diff changeset	3 This document is intended to capture some of the alternatives considered and
1d019706d866 LLVM10 anatofuz parents: diff changeset	4 open debates in the design of MLIR, along with the rationale for certain
1d019706d866 LLVM10 anatofuz parents: diff changeset	5 decisions we made. This is not intended to be a "finely groomed" document - we
1d019706d866 LLVM10 anatofuz parents: diff changeset	6 prefer the ability to dump in interesting tidbits without worrying too much
1d019706d866 LLVM10 anatofuz parents: diff changeset	7 about their consistency or readability.
1d019706d866 LLVM10 anatofuz parents: diff changeset	8
1d019706d866 LLVM10 anatofuz parents: diff changeset	9 [TOC]
1d019706d866 LLVM10 anatofuz parents: diff changeset	10
1d019706d866 LLVM10 anatofuz parents: diff changeset	11 ## Abstract
1d019706d866 LLVM10 anatofuz parents: diff changeset	12
1d019706d866 LLVM10 anatofuz parents: diff changeset	13 MLIR is a compiler intermediate representation with similarities to traditional
1d019706d866 LLVM10 anatofuz parents: diff changeset	14 three-address SSA representations (like
1d019706d866 LLVM10 anatofuz parents: diff changeset	15 [LLVM IR](http://llvm.org/docs/LangRef.html) or
1d019706d866 LLVM10 anatofuz parents: diff changeset	16 [SIL](https://github.com/apple/swift/blob/master/docs/SIL.rst)), but which
1d019706d866 LLVM10 anatofuz parents: diff changeset	17 introduces notions from the polyhedral loop optimization works as first class
1d019706d866 LLVM10 anatofuz parents: diff changeset	18 concepts. This hybrid design is optimized to represent, analyze, and transform
1d019706d866 LLVM10 anatofuz parents: diff changeset	19 high level dataflow graphs as well as target-specific code generated for high
1d019706d866 LLVM10 anatofuz parents: diff changeset	20 performance data parallel systems. Beyond its representational capabilities, its
1d019706d866 LLVM10 anatofuz parents: diff changeset	21 single continuous design provides a framework to lower from dataflow graphs to
1d019706d866 LLVM10 anatofuz parents: diff changeset	22 high performance target specific code.
1d019706d866 LLVM10 anatofuz parents: diff changeset	23
1d019706d866 LLVM10 anatofuz parents: diff changeset	24 MLIR stands for one of "Multi-Level IR" or "Multi-dimensional Loop IR" or
1d019706d866 LLVM10 anatofuz parents: diff changeset	25 "Machine Learning IR" or "Mid Level IR", we prefer the first. This document only
1d019706d866 LLVM10 anatofuz parents: diff changeset	26 provides the rationale behind MLIR -- its actual
1d019706d866 LLVM10 anatofuz parents: diff changeset	27 [specification document](LangRef.md) and other content is hosted elsewhere.
1d019706d866 LLVM10 anatofuz parents: diff changeset	28
1d019706d866 LLVM10 anatofuz parents: diff changeset	29 ## Introduction and Motivation
1d019706d866 LLVM10 anatofuz parents: diff changeset	30
1d019706d866 LLVM10 anatofuz parents: diff changeset	31 The Multi-Level Intermediate Representation (MLIR) is intended for easy
1d019706d866 LLVM10 anatofuz parents: diff changeset	32 expression and optimization of computations involving deep loop nests and dense
1d019706d866 LLVM10 anatofuz parents: diff changeset	33 matrices of high dimensionality. It is thus well-suited to deep learning
1d019706d866 LLVM10 anatofuz parents: diff changeset	34 computations in particular. Yet it is general enough to also represent arbitrary
1d019706d866 LLVM10 anatofuz parents: diff changeset	35 sequential computation. The representation allows high-level optimization and
1d019706d866 LLVM10 anatofuz parents: diff changeset	36 parallelization for a wide range of parallel architectures including those with
1d019706d866 LLVM10 anatofuz parents: diff changeset	37 deep memory hierarchies --- general-purpose multicores, GPUs, and specialized
1d019706d866 LLVM10 anatofuz parents: diff changeset	38 neural network accelerators.
1d019706d866 LLVM10 anatofuz parents: diff changeset	39
1d019706d866 LLVM10 anatofuz parents: diff changeset	40 MLIR uses ideas drawn from IRs of LLVM and Swift for lower level constructs
1d019706d866 LLVM10 anatofuz parents: diff changeset	41 while combining them with ideas from the polyhedral abstraction to represent
1d019706d866 LLVM10 anatofuz parents: diff changeset	42 loop nests, multidimensional data (tensors), and transformations on these
1d019706d866 LLVM10 anatofuz parents: diff changeset	43 entities as first class concepts in the IR.
1d019706d866 LLVM10 anatofuz parents: diff changeset	44
1d019706d866 LLVM10 anatofuz parents: diff changeset	45 MLIR is a multi-level IR, i.e., it represents code at a domain-specific
1d019706d866 LLVM10 anatofuz parents: diff changeset	46 representation such as HLO or TensorFlow graphs, all the way down to the machine
1d019706d866 LLVM10 anatofuz parents: diff changeset	47 level. MLIR is able to represent arbitrary control flow and arbitrary data
1d019706d866 LLVM10 anatofuz parents: diff changeset	48 accesses, and is general enough to represent nearly all sequential computation.
1d019706d866 LLVM10 anatofuz parents: diff changeset	49 This is a key distinction from existing polyhedral representation
1d019706d866 LLVM10 anatofuz parents: diff changeset	50 implementations (such as LLVM [Polly](https://polly.llvm.org/)) that are able to
1d019706d866 LLVM10 anatofuz parents: diff changeset	51 use the polyhedral abstraction in a way isolated from the LLVM IR and only for
1d019706d866 LLVM10 anatofuz parents: diff changeset	52 affine loop nests, i.e., portions of the code where array accesses, loop bounds,
1d019706d866 LLVM10 anatofuz parents: diff changeset	53 and conditionals are regular (involve linear functions of loop iterators and
1d019706d866 LLVM10 anatofuz parents: diff changeset	54 constant symbols). The presence of statically unpredictable data accesses or
1d019706d866 LLVM10 anatofuz parents: diff changeset	55 control flow does not preclude representation in MLIR, but only limits to a
1d019706d866 LLVM10 anatofuz parents: diff changeset	56 certain extent the ability to reason about and apply transformations using the
1d019706d866 LLVM10 anatofuz parents: diff changeset	57 polyhedral abstraction.
1d019706d866 LLVM10 anatofuz parents: diff changeset	58
1d019706d866 LLVM10 anatofuz parents: diff changeset	59 Maps, sets, and relations with affine constraints are the core structures
1d019706d866 LLVM10 anatofuz parents: diff changeset	60 underlying a polyhedral representation of high-dimensional loop nests and
1d019706d866 LLVM10 anatofuz parents: diff changeset	61 multidimensional arrays. These structures are represented as textual
1d019706d866 LLVM10 anatofuz parents: diff changeset	62 expressions in a form close to their mathematical form. These structures are
1d019706d866 LLVM10 anatofuz parents: diff changeset	63 used to capture loop nests, tensor data structures, and how they are reordered
1d019706d866 LLVM10 anatofuz parents: diff changeset	64 and mapped for a target architecture. All structured or "conforming" loops are
1d019706d866 LLVM10 anatofuz parents: diff changeset	65 captured as part of the polyhedral information, and so are tensor variables,
1d019706d866 LLVM10 anatofuz parents: diff changeset	66 their layouts, and subscripted accesses to these tensors in memory.
1d019706d866 LLVM10 anatofuz parents: diff changeset	67
1d019706d866 LLVM10 anatofuz parents: diff changeset	68 The information captured in the IR allows a compact expression of all loop
1d019706d866 LLVM10 anatofuz parents: diff changeset	69 transformations, data remappings, explicit copying necessary for explicitly
1d019706d866 LLVM10 anatofuz parents: diff changeset	70 addressed memory in accelerators, mapping to pre-tuned expert written
1d019706d866 LLVM10 anatofuz parents: diff changeset	71 primitives, and mapping to specialized vector instructions. Loop transformations
1d019706d866 LLVM10 anatofuz parents: diff changeset	72 that can be easily implemented include the body of affine transformations: these
1d019706d866 LLVM10 anatofuz parents: diff changeset	73 subsume all traditional loop transformations (unimodular and non-unimodular)
1d019706d866 LLVM10 anatofuz parents: diff changeset	74 such as loop tiling, interchange, permutation, skewing, scaling, relative
1d019706d866 LLVM10 anatofuz parents: diff changeset	75 shifting, reversal, fusion, and distribution/fission. Transformations on data
1d019706d866 LLVM10 anatofuz parents: diff changeset	76 layout such as padding and transforming to blocked layouts are also represented
1d019706d866 LLVM10 anatofuz parents: diff changeset	77 well via affine layout maps.
1d019706d866 LLVM10 anatofuz parents: diff changeset	78
1d019706d866 LLVM10 anatofuz parents: diff changeset	79 MLIR's design allows a progressive lowering to target-specific forms. Besides
1d019706d866 LLVM10 anatofuz parents: diff changeset	80 high-level transformations for loop nests and data layouts that a typical
1d019706d866 LLVM10 anatofuz parents: diff changeset	81 mid-level optimizer is expected to deal with, MLIR is also designed to perform
1d019706d866 LLVM10 anatofuz parents: diff changeset	82 certain low-level scheduling and mapping decisions that a typical backend IR is
1d019706d866 LLVM10 anatofuz parents: diff changeset	83 entrusted with: these include mapping to specialized vector instructions,
1d019706d866 LLVM10 anatofuz parents: diff changeset	84 auto-vectorization, and software pipelining. The need to support these
1d019706d866 LLVM10 anatofuz parents: diff changeset	85 transformations stems from the fact that neural network accelerators have
1d019706d866 LLVM10 anatofuz parents: diff changeset	86 specialized units that deal with large chunks of data whose computation maps
1d019706d866 LLVM10 anatofuz parents: diff changeset	87 back to chunks of more than one loop of the loop nests as viewed by a program at
1d019706d866 LLVM10 anatofuz parents: diff changeset	88 a level closer to the original specification. Such specialized units or
1d019706d866 LLVM10 anatofuz parents: diff changeset	89 instructions operate on multidimensional data chunks from a programmer's
1d019706d866 LLVM10 anatofuz parents: diff changeset	90 viewpoint. It thus makes it hard or infeasible for a backend operating on a very
1d019706d866 LLVM10 anatofuz parents: diff changeset	91 low-level IR close to assembly to lift and reconstruct loops and perform such a
1d019706d866 LLVM10 anatofuz parents: diff changeset	92 mapping. This is in contrast to classic instruction selection and scheduling in
1d019706d866 LLVM10 anatofuz parents: diff changeset	93 today's compilers that primarily only deals with the body of the innermost loop.
1d019706d866 LLVM10 anatofuz parents: diff changeset	94 MLIR also facilitates automatic mapping to expert pre-tuned primitives or vendor
1d019706d866 LLVM10 anatofuz parents: diff changeset	95 libraries operating on data at higher levels (or at the highest level) of the
1d019706d866 LLVM10 anatofuz parents: diff changeset	96 memory hierarchy.
1d019706d866 LLVM10 anatofuz parents: diff changeset	97
1d019706d866 LLVM10 anatofuz parents: diff changeset	98 In summary, MLIR is convenient for and closed under the kind of transformations
1d019706d866 LLVM10 anatofuz parents: diff changeset	99 needed to lower to general-purpose as well as specialized accelerators. It also
1d019706d866 LLVM10 anatofuz parents: diff changeset	100 allows one to build modular and reusable target independent and target dependent
1d019706d866 LLVM10 anatofuz parents: diff changeset	101 passes.
1d019706d866 LLVM10 anatofuz parents: diff changeset	102
1d019706d866 LLVM10 anatofuz parents: diff changeset	103 ## Design Decisions
1d019706d866 LLVM10 anatofuz parents: diff changeset	104
1d019706d866 LLVM10 anatofuz parents: diff changeset	105 This section sheds light on some of the design decisions -- some of these are
1d019706d866 LLVM10 anatofuz parents: diff changeset	106 indirectly implied by the specification document.
1d019706d866 LLVM10 anatofuz parents: diff changeset	107
1d019706d866 LLVM10 anatofuz parents: diff changeset	108 ### Loads and stores
1d019706d866 LLVM10 anatofuz parents: diff changeset	109
1d019706d866 LLVM10 anatofuz parents: diff changeset	110 The 'load' and 'store' instructions are specifically crafted to fully resolve to
1d019706d866 LLVM10 anatofuz parents: diff changeset	111 an element of a memref. These instructions take as arguments n+1 indices for an
1d019706d866 LLVM10 anatofuz parents: diff changeset	112 n-ranked tensor. This disallows the equivalent of pointer arithmetic or the
1d019706d866 LLVM10 anatofuz parents: diff changeset	113 ability to index into the same memref in other ways (something which C arrays
1d019706d866 LLVM10 anatofuz parents: diff changeset	114 allow for example). Furthermore, for the affine constructs, the compiler can
1d019706d866 LLVM10 anatofuz parents: diff changeset	115 follow use-def chains (e.g. through
1d019706d866 LLVM10 anatofuz parents: diff changeset	116 [affine.apply operations](Dialects/Affine.md#affineapply-operation)) or through
1d019706d866 LLVM10 anatofuz parents: diff changeset	117 the map attributes of [affine operations](Dialects/Affine.md#Operations)) to
1d019706d866 LLVM10 anatofuz parents: diff changeset	118 precisely analyze references at compile-time using polyhedral techniques. This
1d019706d866 LLVM10 anatofuz parents: diff changeset	119 is possible because of the [restrictions on dimensions and symbols](Dialects/Affine.md#restrictions-on-dimensions-and-symbols).
1d019706d866 LLVM10 anatofuz parents: diff changeset	120
1d019706d866 LLVM10 anatofuz parents: diff changeset	121 A scalar of element-type (a primitive type or a vector type) that is stored in
1d019706d866 LLVM10 anatofuz parents: diff changeset	122 memory is modeled as a 0-d memref. This is also necessary for scalars that are
1d019706d866 LLVM10 anatofuz parents: diff changeset	123 live out of for loops and if conditionals in a function, for which we don't yet
1d019706d866 LLVM10 anatofuz parents: diff changeset	124 have an SSA representation --
1d019706d866 LLVM10 anatofuz parents: diff changeset	125 [an extension](#mlfunction-extensions-for-"escaping-scalars") to allow that is
1d019706d866 LLVM10 anatofuz parents: diff changeset	126 described later in this doc.
1d019706d866 LLVM10 anatofuz parents: diff changeset	127
1d019706d866 LLVM10 anatofuz parents: diff changeset	128 ### Symbols and types
1d019706d866 LLVM10 anatofuz parents: diff changeset	129
1d019706d866 LLVM10 anatofuz parents: diff changeset	130 The current MLIR disallows use of symbols in types. For example, when a tensor
1d019706d866 LLVM10 anatofuz parents: diff changeset	131 or memref dimension is statically unknown, it is denoted in the type as '?'. An
1d019706d866 LLVM10 anatofuz parents: diff changeset	132 SSA symbol is then bound to it when a memref is created. The actual value of the
1d019706d866 LLVM10 anatofuz parents: diff changeset	133 unknown dimension can be queried using the "dim" builtin as shown below.
1d019706d866 LLVM10 anatofuz parents: diff changeset	134
1d019706d866 LLVM10 anatofuz parents: diff changeset	135 Example:
1d019706d866 LLVM10 anatofuz parents: diff changeset	136
1d019706d866 LLVM10 anatofuz parents: diff changeset	137 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	138 func foo(...) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	139 %A = alloc <8x?xf32, #lmap> (%N)
1d019706d866 LLVM10 anatofuz parents: diff changeset	140 ...
1d019706d866 LLVM10 anatofuz parents: diff changeset	141 call bar(%A) : (memref<8x?xf32, #lmap>)
1d019706d866 LLVM10 anatofuz parents: diff changeset	142 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	143
1d019706d866 LLVM10 anatofuz parents: diff changeset	144 func bar(%A : memref<8x?xf32, #lmap>) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	145 // Type of %A indicates that %A has dynamic shape with 8 rows
1d019706d866 LLVM10 anatofuz parents: diff changeset	146 // and unknown number of columns. The number of columns is queried
1d019706d866 LLVM10 anatofuz parents: diff changeset	147 // dynamically using dim instruction.
1d019706d866 LLVM10 anatofuz parents: diff changeset	148 %N = dim %A, 1 : memref<8x?xf32, #lmap>
1d019706d866 LLVM10 anatofuz parents: diff changeset	149
1d019706d866 LLVM10 anatofuz parents: diff changeset	150 affine.for %i = 0 to 8 {
1d019706d866 LLVM10 anatofuz parents: diff changeset	151 affine.for %j = 0 to %N {
1d019706d866 LLVM10 anatofuz parents: diff changeset	152 // A[i,j] += 1
1d019706d866 LLVM10 anatofuz parents: diff changeset	153 %s1 = affine.load %A[%i, %j] : memref<8x?xf32, #lmap>
1d019706d866 LLVM10 anatofuz parents: diff changeset	154 %s2 = add %s1, 1
1d019706d866 LLVM10 anatofuz parents: diff changeset	155 affine.store %s2, %A[%i, %j] : memref<8x?xf32, #lmap>
1d019706d866 LLVM10 anatofuz parents: diff changeset	156 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	157 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	158 return
1d019706d866 LLVM10 anatofuz parents: diff changeset	159 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	160
1d019706d866 LLVM10 anatofuz parents: diff changeset	161 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	162
1d019706d866 LLVM10 anatofuz parents: diff changeset	163 An alternative design is to embed the reference to symbols directly in the
1d019706d866 LLVM10 anatofuz parents: diff changeset	164 type - memref<8x%Nxf32>. We went for the current approach in MLIR because it
1d019706d866 LLVM10 anatofuz parents: diff changeset	165 simplifies the design --- types remain immutable when the values of symbols
1d019706d866 LLVM10 anatofuz parents: diff changeset	166 change.
1d019706d866 LLVM10 anatofuz parents: diff changeset	167
1d019706d866 LLVM10 anatofuz parents: diff changeset	168 ### Block Arguments vs PHI nodes
1d019706d866 LLVM10 anatofuz parents: diff changeset	169
1d019706d866 LLVM10 anatofuz parents: diff changeset	170 MLIR Regions represent SSA using "[block arguments](LangRef.md#blocks)" rather
1d019706d866 LLVM10 anatofuz parents: diff changeset	171 than [PHI instructions](http://llvm.org/docs/LangRef.html#i-phi) used in LLVM.
1d019706d866 LLVM10 anatofuz parents: diff changeset	172 This choice is representationally identical (the same constructs can be
1d019706d866 LLVM10 anatofuz parents: diff changeset	173 represented in either form) but block arguments have several advantages:
1d019706d866 LLVM10 anatofuz parents: diff changeset	174
1d019706d866 LLVM10 anatofuz parents: diff changeset	175 1. LLVM PHI nodes always have to be kept at the top of a block, and
1d019706d866 LLVM10 anatofuz parents: diff changeset	176 transformations frequently have to manually skip over them. This is defined
1d019706d866 LLVM10 anatofuz parents: diff changeset	177 away with BB arguments.
1d019706d866 LLVM10 anatofuz parents: diff changeset	178 1. LLVM has a separate function Argument node. This is defined away with BB
1d019706d866 LLVM10 anatofuz parents: diff changeset	179 arguments, because the arguments to the entry block serve this purpose.
1d019706d866 LLVM10 anatofuz parents: diff changeset	180 1. Blocks of PHI nodes in LLVM execute atomically, which is surprising and
1d019706d866 LLVM10 anatofuz parents: diff changeset	181 super confusing to compiler engineers and it is easy to introduce bugs with
1d019706d866 LLVM10 anatofuz parents: diff changeset	182 this (very related to the
1d019706d866 LLVM10 anatofuz parents: diff changeset	183 "[lost copy](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.524.5461&rep=rep1&type=pdf)"
1d019706d866 LLVM10 anatofuz parents: diff changeset	184 problem in SSA lowering literature.) With the BB argument representation,
1d019706d866 LLVM10 anatofuz parents: diff changeset	185 this confusion is defined away.
1d019706d866 LLVM10 anatofuz parents: diff changeset	186 1. The entry list of PHI nodes in LLVM are unordered, and some blocks have
1d019706d866 LLVM10 anatofuz parents: diff changeset	187 thousands of predecessors (e.g. unwind blocks). This can cause long compile
1d019706d866 LLVM10 anatofuz parents: diff changeset	188 time problems because transformations have to linearly scan this list. This
1d019706d866 LLVM10 anatofuz parents: diff changeset	189 is defined away with BB argument representation.
1d019706d866 LLVM10 anatofuz parents: diff changeset	190 1. LLVM has no way to represent values that are available only in one successor
1d019706d866 LLVM10 anatofuz parents: diff changeset	191 but not the other, e.g. its invoke instruction cannot produce the exception
1d019706d866 LLVM10 anatofuz parents: diff changeset	192 value JUST on the exception edge. Instead, the
1d019706d866 LLVM10 anatofuz parents: diff changeset	193 [landingpad instruction](http://llvm.org/docs/LangRef.html#landingpad-instruction)
1d019706d866 LLVM10 anatofuz parents: diff changeset	194 is a hack used to represent this. MLIR doesn't make use of this capability,
1d019706d866 LLVM10 anatofuz parents: diff changeset	195 but SIL uses it extensively, e.g. in the
1d019706d866 LLVM10 anatofuz parents: diff changeset	196 [switch_enum instruction](https://github.com/apple/swift/blob/master/docs/SIL.rst#switch-enum).
1d019706d866 LLVM10 anatofuz parents: diff changeset	197
1d019706d866 LLVM10 anatofuz parents: diff changeset	198 For more context, block arguments were previously used in the Swift
1d019706d866 LLVM10 anatofuz parents: diff changeset	199 [SIL Intermediate Representation](https://github.com/apple/swift/blob/master/docs/SIL.rst),
1d019706d866 LLVM10 anatofuz parents: diff changeset	200 and described in
1d019706d866 LLVM10 anatofuz parents: diff changeset	201 [a talk on YouTube](https://www.youtube.com/watch?v=Ntj8ab-5cvE). The section of
1d019706d866 LLVM10 anatofuz parents: diff changeset	202 interest
1d019706d866 LLVM10 anatofuz parents: diff changeset	203 [starts here](https://www.google.com/url?q=https://youtu.be/Ntj8ab-5cvE?t%3D596&sa=D&ust=1529450150971000&usg=AFQjCNFQHEWL7m8q3eO-1DiKw9zqC2v24Q).
1d019706d866 LLVM10 anatofuz parents: diff changeset	204
1d019706d866 LLVM10 anatofuz parents: diff changeset	205 ### Index type disallowed in vector/tensor/memref types
1d019706d866 LLVM10 anatofuz parents: diff changeset	206
1d019706d866 LLVM10 anatofuz parents: diff changeset	207 Index types are not allowed as elements of `vector`, `tensor` or `memref` type.
1d019706d866 LLVM10 anatofuz parents: diff changeset	208 Index types are intended to be used for platform-specific "size" values and may
1d019706d866 LLVM10 anatofuz parents: diff changeset	209 appear in subscripts, sizes of aggregate types and affine expressions. They are
1d019706d866 LLVM10 anatofuz parents: diff changeset	210 also tightly coupled with `affine.apply` and affine.load/store operations;
1d019706d866 LLVM10 anatofuz parents: diff changeset	211 having `index` type is a necessary precondition of a value to be acceptable by
1d019706d866 LLVM10 anatofuz parents: diff changeset	212 these operations. While it may be useful to have `memref<?xindex>` to express
1d019706d866 LLVM10 anatofuz parents: diff changeset	213 indirect accesses, e.g. sparse matrix manipulations or lookup tables, it creates
1d019706d866 LLVM10 anatofuz parents: diff changeset	214 problems MLIR is not ready to address yet. MLIR needs to internally store
1d019706d866 LLVM10 anatofuz parents: diff changeset	215 constants of aggregate types and emit code operating on values of those types,
1d019706d866 LLVM10 anatofuz parents: diff changeset	216 which are subject to target-specific size and alignment constraints. Since MLIR
1d019706d866 LLVM10 anatofuz parents: diff changeset	217 does not have a target description mechanism at the moment, it cannot reliably
1d019706d866 LLVM10 anatofuz parents: diff changeset	218 emit such code. Moreover, some platforms may not support vectors of type
1d019706d866 LLVM10 anatofuz parents: diff changeset	219 equivalent to `index`.
1d019706d866 LLVM10 anatofuz parents: diff changeset	220
1d019706d866 LLVM10 anatofuz parents: diff changeset	221 Indirect access use cases can be alternatively supported by providing and
1d019706d866 LLVM10 anatofuz parents: diff changeset	222 `index_cast` instruction that allows for conversion between `index` and
1d019706d866 LLVM10 anatofuz parents: diff changeset	223 fixed-width integer types, at the SSA value level. It has an additional benefit
1d019706d866 LLVM10 anatofuz parents: diff changeset	224 of supporting smaller integer types, e.g. `i8` or `i16`, for small indices
1d019706d866 LLVM10 anatofuz parents: diff changeset	225 instead of (presumably larger) `index` type.
1d019706d866 LLVM10 anatofuz parents: diff changeset	226
1d019706d866 LLVM10 anatofuz parents: diff changeset	227 ### Bit width of a non-primitive types and `index` is undefined
1d019706d866 LLVM10 anatofuz parents: diff changeset	228
1d019706d866 LLVM10 anatofuz parents: diff changeset	229 The bit width of a compound type is not defined by MLIR, it may be defined by a
1d019706d866 LLVM10 anatofuz parents: diff changeset	230 specific lowering pass. In MLIR, bit width is a property of certain primitive
1d019706d866 LLVM10 anatofuz parents: diff changeset	231 _type_, in particular integers and floats. It is equal to the number that
1d019706d866 LLVM10 anatofuz parents: diff changeset	232 appears in the type definition, e.g. the bit width of `i32` is `32`, so is the
1d019706d866 LLVM10 anatofuz parents: diff changeset	233 bit width of `f32`. The bit width is not _necessarily_ related to the amount of
1d019706d866 LLVM10 anatofuz parents: diff changeset	234 memory (in bytes) or the size of register (in bits) that is necessary to store
1d019706d866 LLVM10 anatofuz parents: diff changeset	235 the value of the given type. These quantities are target and ABI-specific and
1d019706d866 LLVM10 anatofuz parents: diff changeset	236 should be defined during the lowering process rather than imposed from above.
1d019706d866 LLVM10 anatofuz parents: diff changeset	237 For example, `vector<3xi57>` is likely to be lowered to a vector of four 64-bit
1d019706d866 LLVM10 anatofuz parents: diff changeset	238 integers, so that its storage requirement is `4 x 64 / 8 = 32` bytes, rather
1d019706d866 LLVM10 anatofuz parents: diff changeset	239 than `(3 x 57) ceildiv 8 = 22` bytes as can be naively computed from the
1d019706d866 LLVM10 anatofuz parents: diff changeset	240 bitwidth. Individual components of MLIR that allocate space for storing values
1d019706d866 LLVM10 anatofuz parents: diff changeset	241 may use the bit size as the baseline and query the target description when it is
1d019706d866 LLVM10 anatofuz parents: diff changeset	242 introduced.
1d019706d866 LLVM10 anatofuz parents: diff changeset	243
1d019706d866 LLVM10 anatofuz parents: diff changeset	244 The bit width is not defined for dialect-specific types at MLIR level. Dialects
1d019706d866 LLVM10 anatofuz parents: diff changeset	245 are free to define their own quantities for type sizes.
1d019706d866 LLVM10 anatofuz parents: diff changeset	246
1d019706d866 LLVM10 anatofuz parents: diff changeset	247 ### Signless types
1d019706d866 LLVM10 anatofuz parents: diff changeset	248
1d019706d866 LLVM10 anatofuz parents: diff changeset	249 Integers in the builtin MLIR type system have a bitwidth (note that the `index`
1d019706d866 LLVM10 anatofuz parents: diff changeset	250 type has a symbolic width equal to the machine word size), but they do not have
1d019706d866 LLVM10 anatofuz parents: diff changeset	251 an intrinsic sign. This means that the "standard ops" operation set has things
1d019706d866 LLVM10 anatofuz parents: diff changeset	252 like `addi` and `muli` which do two's complement arithmetic, but some other
1d019706d866 LLVM10 anatofuz parents: diff changeset	253 operations get a sign, e.g. `divis` vs `diviu`.
1d019706d866 LLVM10 anatofuz parents: diff changeset	254
1d019706d866 LLVM10 anatofuz parents: diff changeset	255 LLVM uses the [same design](http://llvm.org/docs/LangRef.html#integer-type),
1d019706d866 LLVM10 anatofuz parents: diff changeset	256 which was introduced in a revamp rolled out
1d019706d866 LLVM10 anatofuz parents: diff changeset	257 [in the LLVM 2.0 integer type](http://releases.llvm.org/2.0/docs/LangRef.html#t_derived).
1d019706d866 LLVM10 anatofuz parents: diff changeset	258 Prior to that, from
1d019706d866 LLVM10 anatofuz parents: diff changeset	259 [LLVM 1.0](http://releases.llvm.org/1.0/docs/LangRef.html#t_classifications) to
1d019706d866 LLVM10 anatofuz parents: diff changeset	260 [1.9](http://releases.llvm.org/1.9/docs/LangRef.html#t_classifications), LLVM
1d019706d866 LLVM10 anatofuz parents: diff changeset	261 uses signed types like "sbyte" and "ubyte". This shift was important and has
1d019706d866 LLVM10 anatofuz parents: diff changeset	262 served LLVM well over the years. The reason this is important is that it is a
1d019706d866 LLVM10 anatofuz parents: diff changeset	263 good thing for an intermediate representation to represent the same computation
1d019706d866 LLVM10 anatofuz parents: diff changeset	264 with the same instruction. Signed types got in the way, because (e.g.) an "add
1d019706d866 LLVM10 anatofuz parents: diff changeset	265 of an sbyte" does the same computation as an "add of a ubyte", but the type
1d019706d866 LLVM10 anatofuz parents: diff changeset	266 system made them look artificially different. This split also required casts
1d019706d866 LLVM10 anatofuz parents: diff changeset	267 like "cast from sbyte to ubyte" which do nothing at the machine level. Removing
1d019706d866 LLVM10 anatofuz parents: diff changeset	268 signs from the type system eliminated these problems, making the compiler
1d019706d866 LLVM10 anatofuz parents: diff changeset	269 simpler.
1d019706d866 LLVM10 anatofuz parents: diff changeset	270
1d019706d866 LLVM10 anatofuz parents: diff changeset	271 More information about this split is available in an old
1d019706d866 LLVM10 anatofuz parents: diff changeset	272 [talk on youtube](https://www.youtube.com/watch?v=VeRaLPupGks) talking about
1d019706d866 LLVM10 anatofuz parents: diff changeset	273 LLVM 2.0.
1d019706d866 LLVM10 anatofuz parents: diff changeset	274
1d019706d866 LLVM10 anatofuz parents: diff changeset	275 Note that this rationale only applies to the "standard ops" dialect in which we
1d019706d866 LLVM10 anatofuz parents: diff changeset	276 can express an opinion about its design. Other dialects generally try to model
1d019706d866 LLVM10 anatofuz parents: diff changeset	277 an external system, and should aim to reflect its design as closely as possible.
1d019706d866 LLVM10 anatofuz parents: diff changeset	278
1d019706d866 LLVM10 anatofuz parents: diff changeset	279 ### Splitting floating point vs integer operations
1d019706d866 LLVM10 anatofuz parents: diff changeset	280
1d019706d866 LLVM10 anatofuz parents: diff changeset	281 The MLIR "standard" operation set splits many integer and floating point
1d019706d866 LLVM10 anatofuz parents: diff changeset	282 operations into different categories, for example `addf` vs `addi` and `cmpf` vs
1d019706d866 LLVM10 anatofuz parents: diff changeset	283 `cmpi`
1d019706d866 LLVM10 anatofuz parents: diff changeset	284 ([following the design of LLVM](http://llvm.org/docs/LangRef.html#binary-operations)).
1d019706d866 LLVM10 anatofuz parents: diff changeset	285 These instructions _are_ polymorphic on the number of elements in the type
1d019706d866 LLVM10 anatofuz parents: diff changeset	286 though, for example `addf` is used with scalar floats, vectors of floats, and
1d019706d866 LLVM10 anatofuz parents: diff changeset	287 tensors of floats (LLVM does the same thing with its scalar/vector types).
1d019706d866 LLVM10 anatofuz parents: diff changeset	288
1d019706d866 LLVM10 anatofuz parents: diff changeset	289 This split is important because floating point and integer operations are quite
1d019706d866 LLVM10 anatofuz parents: diff changeset	290 different in practice: for example, floating point values include NaN's, so
1d019706d866 LLVM10 anatofuz parents: diff changeset	291 [integer comparisons](http://llvm.org/docs/LangRef.html#icmp-instruction) and
1d019706d866 LLVM10 anatofuz parents: diff changeset	292 [floating point comparisons](http://llvm.org/docs/LangRef.html#fcmp-instruction)
1d019706d866 LLVM10 anatofuz parents: diff changeset	293 should use different comparison opcodes. On the arithmetic side of things,
1d019706d866 LLVM10 anatofuz parents: diff changeset	294 floating point operations support rounding modes, floating point contractions,
1d019706d866 LLVM10 anatofuz parents: diff changeset	295 ["fast math"](http://llvm.org/docs/LangRef.html#fadd-instruction), and integers
1d019706d866 LLVM10 anatofuz parents: diff changeset	296 may want to have two's complement overflow behavior or be undefined on
1d019706d866 LLVM10 anatofuz parents: diff changeset	297 [various forms of wrapping](http://llvm.org/docs/LangRef.html#add-instruction)
1d019706d866 LLVM10 anatofuz parents: diff changeset	298 for performance.
1d019706d866 LLVM10 anatofuz parents: diff changeset	299
1d019706d866 LLVM10 anatofuz parents: diff changeset	300 We are a long way from this sort of thing being a priority to care about in
1d019706d866 LLVM10 anatofuz parents: diff changeset	301 MLIR, but since we have experience and know the right way to do this, we'd
1d019706d866 LLVM10 anatofuz parents: diff changeset	302 rather design it in from the beginning.
1d019706d866 LLVM10 anatofuz parents: diff changeset	303
1d019706d866 LLVM10 anatofuz parents: diff changeset	304 Note that this rationale only applies to the "standard ops" dialect in which we
1d019706d866 LLVM10 anatofuz parents: diff changeset	305 can express an opinion about its design. Other dialects generally try to model
1d019706d866 LLVM10 anatofuz parents: diff changeset	306 an external system, and should aim to reflect its design as closely as possible.
1d019706d866 LLVM10 anatofuz parents: diff changeset	307
1d019706d866 LLVM10 anatofuz parents: diff changeset	308 ### Specifying sign in integer comparison operations
1d019706d866 LLVM10 anatofuz parents: diff changeset	309
1d019706d866 LLVM10 anatofuz parents: diff changeset	310 Since integers are [signless](#signless-types), it is necessary to define the
1d019706d866 LLVM10 anatofuz parents: diff changeset	311 sign for integer comparison operations. This sign indicates how to treat the
1d019706d866 LLVM10 anatofuz parents: diff changeset	312 foremost bit of the integer: as sign bit or as most significant bit. For
1d019706d866 LLVM10 anatofuz parents: diff changeset	313 example, comparing two `i4` values `0b1000` and `0b0010` yields different
1d019706d866 LLVM10 anatofuz parents: diff changeset	314 results for unsigned (`8 > 3`) and signed (`-8 < 3`) interpretations. This
1d019706d866 LLVM10 anatofuz parents: diff changeset	315 difference is only significant for _order_ comparisons, but not for _equality_
1d019706d866 LLVM10 anatofuz parents: diff changeset	316 comparisons. Indeed, for the latter all bits must have the same value
1d019706d866 LLVM10 anatofuz parents: diff changeset	317 independently of the sign. Since both arguments have exactly the same bit width
1d019706d866 LLVM10 anatofuz parents: diff changeset	318 and cannot be padded by this operation, it is impossible to compare two values
1d019706d866 LLVM10 anatofuz parents: diff changeset	319 whose bit representations would differ while the values are interpreted as
1d019706d866 LLVM10 anatofuz parents: diff changeset	320 equal.
1d019706d866 LLVM10 anatofuz parents: diff changeset	321
1d019706d866 LLVM10 anatofuz parents: diff changeset	322 ### Specifying comparison kind as attribute
1d019706d866 LLVM10 anatofuz parents: diff changeset	323
1d019706d866 LLVM10 anatofuz parents: diff changeset	324 Unlike arithmetic, comparison operators share several common properties, e.g.
1d019706d866 LLVM10 anatofuz parents: diff changeset	325 they cannot be considered associative. In practice, comparisons are sometimes
1d019706d866 LLVM10 anatofuz parents: diff changeset	326 implemented by the same instruction or its variants so it makes sense to group
1d019706d866 LLVM10 anatofuz parents: diff changeset	327 them together at the IR level.
1d019706d866 LLVM10 anatofuz parents: diff changeset	328
1d019706d866 LLVM10 anatofuz parents: diff changeset	329 An alternative would be introducing ten distinct operators for all currently
1d019706d866 LLVM10 anatofuz parents: diff changeset	330 supported kinds of integer comparisons. These operators would have increased the
1d019706d866 LLVM10 anatofuz parents: diff changeset	331 number of "reserved" names used by standard operations as well as the size of
1d019706d866 LLVM10 anatofuz parents: diff changeset	332 the C++ API while their implementations would have been mostly identical.
1d019706d866 LLVM10 anatofuz parents: diff changeset	333
1d019706d866 LLVM10 anatofuz parents: diff changeset	334 The comparison kind is internally an integer attribute. However, for the sake of
1d019706d866 LLVM10 anatofuz parents: diff changeset	335 readability by humans, custom assembly form accepts string literals that are
1d019706d866 LLVM10 anatofuz parents: diff changeset	336 mapped to the underlying integer values: `cmpi "eq", %lhs, %rhs` better implies
1d019706d866 LLVM10 anatofuz parents: diff changeset	337 integer equality comparison than `cmpi 0, %lhs, %rhs` where it is unclear what
1d019706d866 LLVM10 anatofuz parents: diff changeset	338 gets compared to what else. This syntactic sugar is possible thanks to parser
1d019706d866 LLVM10 anatofuz parents: diff changeset	339 logic redefinitions for custom assembly form of non-builtin operations.
1d019706d866 LLVM10 anatofuz parents: diff changeset	340 Supporting it in the full notation would have required changing how the main
1d019706d866 LLVM10 anatofuz parents: diff changeset	341 parsing algorithm works and may have unexpected repercussions. While it had been
1d019706d866 LLVM10 anatofuz parents: diff changeset	342 possible to store the predicate as string attribute, it would have rendered
1d019706d866 LLVM10 anatofuz parents: diff changeset	343 impossible to implement switching logic based on the comparison kind and made
1d019706d866 LLVM10 anatofuz parents: diff changeset	344 attribute validity checks (one out of ten possible kinds) more complex.
1d019706d866 LLVM10 anatofuz parents: diff changeset	345
1d019706d866 LLVM10 anatofuz parents: diff changeset	346 ### 'select' operation to implement min/max
1d019706d866 LLVM10 anatofuz parents: diff changeset	347
1d019706d866 LLVM10 anatofuz parents: diff changeset	348 Although `min` and `max` operations are likely to occur as a result of
1d019706d866 LLVM10 anatofuz parents: diff changeset	349 transforming affine loops in ML functions, we did not make them first-class
1d019706d866 LLVM10 anatofuz parents: diff changeset	350 operations. Instead, we provide the `select` operation that can be combined with
1d019706d866 LLVM10 anatofuz parents: diff changeset	351 `cmpi` to implement the minimum and maximum computation. Although they now
1d019706d866 LLVM10 anatofuz parents: diff changeset	352 require two operations, they are likely to be emitted automatically during the
1d019706d866 LLVM10 anatofuz parents: diff changeset	353 transformation inside MLIR. On the other hand, there are multiple benefits of
1d019706d866 LLVM10 anatofuz parents: diff changeset	354 introducing `select`: standalone min/max would concern themselves with the
1d019706d866 LLVM10 anatofuz parents: diff changeset	355 signedness of the comparison, already taken into account by `cmpi`; `select` can
1d019706d866 LLVM10 anatofuz parents: diff changeset	356 support floats transparently if used after a float-comparison operation; the
1d019706d866 LLVM10 anatofuz parents: diff changeset	357 lower-level targets provide `select`-like instructions making the translation
1d019706d866 LLVM10 anatofuz parents: diff changeset	358 trivial.
1d019706d866 LLVM10 anatofuz parents: diff changeset	359
1d019706d866 LLVM10 anatofuz parents: diff changeset	360 This operation could have been implemented with additional control flow: `%r =
1d019706d866 LLVM10 anatofuz parents: diff changeset	361 select %cond, %t, %f` is equivalent to
1d019706d866 LLVM10 anatofuz parents: diff changeset	362
1d019706d866 LLVM10 anatofuz parents: diff changeset	363 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	364 ^bb0:
1d019706d866 LLVM10 anatofuz parents: diff changeset	365 cond_br %cond, ^bb1(%t), ^bb1(%f)
1d019706d866 LLVM10 anatofuz parents: diff changeset	366 ^bb1(%r):
1d019706d866 LLVM10 anatofuz parents: diff changeset	367 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	368
1d019706d866 LLVM10 anatofuz parents: diff changeset	369 However, this control flow granularity is not available in the ML functions
1d019706d866 LLVM10 anatofuz parents: diff changeset	370 where min/max, and thus `select`, are likely to appear. In addition, simpler
1d019706d866 LLVM10 anatofuz parents: diff changeset	371 control flow may be beneficial for optimization in general.
1d019706d866 LLVM10 anatofuz parents: diff changeset	372
1d019706d866 LLVM10 anatofuz parents: diff changeset	373 ### Regions
1d019706d866 LLVM10 anatofuz parents: diff changeset	374
1d019706d866 LLVM10 anatofuz parents: diff changeset	375 #### Attributes of type 'Block'
1d019706d866 LLVM10 anatofuz parents: diff changeset	376
1d019706d866 LLVM10 anatofuz parents: diff changeset	377 We considered representing regions through `ArrayAttr`s containing a list of a
1d019706d866 LLVM10 anatofuz parents: diff changeset	378 special type `IRBlockAttr`, which in turn would contain a list of operations.
1d019706d866 LLVM10 anatofuz parents: diff changeset	379 All attributes in MLIR are unique’d within the context, which would make the IR
1d019706d866 LLVM10 anatofuz parents: diff changeset	380 inside the regions immortal for no good reason.
1d019706d866 LLVM10 anatofuz parents: diff changeset	381
1d019706d866 LLVM10 anatofuz parents: diff changeset	382 #### Use "inlined" functions as regions
1d019706d866 LLVM10 anatofuz parents: diff changeset	383
1d019706d866 LLVM10 anatofuz parents: diff changeset	384 We considered attaching a "force-inline" attribute on a function and/or a
1d019706d866 LLVM10 anatofuz parents: diff changeset	385 function `call` operation. Even the minimal region support (use cases in
1d019706d866 LLVM10 anatofuz parents: diff changeset	386 affine.for and affine.if existing before the regions) requires access to the
1d019706d866 LLVM10 anatofuz parents: diff changeset	387 values defined in the dominating block, which is not supported by functions.
1d019706d866 LLVM10 anatofuz parents: diff changeset	388 Conceptually, function bodies are instances of regions rather than the inverse;
1d019706d866 LLVM10 anatofuz parents: diff changeset	389 regions can also be device kernels, alternative sections, etc.
1d019706d866 LLVM10 anatofuz parents: diff changeset	390
1d019706d866 LLVM10 anatofuz parents: diff changeset	391 #### Dedicated `region` operation
1d019706d866 LLVM10 anatofuz parents: diff changeset	392
1d019706d866 LLVM10 anatofuz parents: diff changeset	393 This would mean we have a special kind of operation that is allowed to have
1d019706d866 LLVM10 anatofuz parents: diff changeset	394 regions while other operations are not. Such distinction is similar to the
1d019706d866 LLVM10 anatofuz parents: diff changeset	395 Stmt/Op difference we have had and chose to remove to make the IR simpler and
1d019706d866 LLVM10 anatofuz parents: diff changeset	396 more flexible. It would also require analyses and passes to consider the
1d019706d866 LLVM10 anatofuz parents: diff changeset	397 interplay between operations (e.g., an `affine.for` operation must be followed
1d019706d866 LLVM10 anatofuz parents: diff changeset	398 by a region operation). Finally, a region operation can be introduced using the
1d019706d866 LLVM10 anatofuz parents: diff changeset	399 current implementation, among other operations and without being special in any
1d019706d866 LLVM10 anatofuz parents: diff changeset	400 sense.
1d019706d866 LLVM10 anatofuz parents: diff changeset	401
1d019706d866 LLVM10 anatofuz parents: diff changeset	402 #### Explicit capture of the values used in a region
1d019706d866 LLVM10 anatofuz parents: diff changeset	403
1d019706d866 LLVM10 anatofuz parents: diff changeset	404 Being able to use values defined outside the region implies that use-def chains
1d019706d866 LLVM10 anatofuz parents: diff changeset	405 may contain uses from different nested regions. Consequently, IR transformations
1d019706d866 LLVM10 anatofuz parents: diff changeset	406 and analyses can pull the instruction defining the value across region
1d019706d866 LLVM10 anatofuz parents: diff changeset	407 boundaries, for example in case of TableGen-defined canonicalization patterns.
1d019706d866 LLVM10 anatofuz parents: diff changeset	408 This would not be the case if all used values had been passed as region
1d019706d866 LLVM10 anatofuz parents: diff changeset	409 arguments. One of the motivations for introducing regions in the IR is precisely
1d019706d866 LLVM10 anatofuz parents: diff changeset	410 to enable cross-region analyses and transformations that are simpler than
1d019706d866 LLVM10 anatofuz parents: diff changeset	411 inter-procedural transformations. Having uses from different regions appear in
1d019706d866 LLVM10 anatofuz parents: diff changeset	412 the same use-def chain, contrary to an additional data structure maintaining
1d019706d866 LLVM10 anatofuz parents: diff changeset	413 correspondence between function call arguments as uses of the original
1d019706d866 LLVM10 anatofuz parents: diff changeset	414 definitions and formal arguments as new definitions, enables such
1d019706d866 LLVM10 anatofuz parents: diff changeset	415 simplification. Since individual operations now belong to blocks, which belong
1d019706d866 LLVM10 anatofuz parents: diff changeset	416 to regions, it is always possible to check if the definition of the value
1d019706d866 LLVM10 anatofuz parents: diff changeset	417 belongs to the same region as its particular use. The risk is that any IR
1d019706d866 LLVM10 anatofuz parents: diff changeset	418 traversal will need to handle explicitly this situation and it is easy to forget
1d019706d866 LLVM10 anatofuz parents: diff changeset	419 a check (or conversely it isn’t easy to design the right check in a tablegen
1d019706d866 LLVM10 anatofuz parents: diff changeset	420 pattern for example): traversing use-def chains potentially crosses implicitly
1d019706d866 LLVM10 anatofuz parents: diff changeset	421 semantic barriers, making it possible to unknowingly break region semantics.
1d019706d866 LLVM10 anatofuz parents: diff changeset	422 This is expected to be caught in the verifier after the transformation.
1d019706d866 LLVM10 anatofuz parents: diff changeset	423
1d019706d866 LLVM10 anatofuz parents: diff changeset	424 At the same time, one may choose to pass certain or all values as region
1d019706d866 LLVM10 anatofuz parents: diff changeset	425 arguments to explicitly break the use-def chains in the current proposal. This
1d019706d866 LLVM10 anatofuz parents: diff changeset	426 can be combined with an attribute-imposed semantic requirement disallowing the
1d019706d866 LLVM10 anatofuz parents: diff changeset	427 body of the region to refer to any value from outside it.
1d019706d866 LLVM10 anatofuz parents: diff changeset	428
1d019706d866 LLVM10 anatofuz parents: diff changeset	429 ### Quantized integer operations
1d019706d866 LLVM10 anatofuz parents: diff changeset	430
1d019706d866 LLVM10 anatofuz parents: diff changeset	431 We haven't designed integer quantized operations in MLIR, but experience from
1d019706d866 LLVM10 anatofuz parents: diff changeset	432 TensorFlow suggests that it is better to put information about the quantization
1d019706d866 LLVM10 anatofuz parents: diff changeset	433 range/scale into the type itself, rather than have a single type like "qint8"
1d019706d866 LLVM10 anatofuz parents: diff changeset	434 and put these on attributes of the operation.
1d019706d866 LLVM10 anatofuz parents: diff changeset	435
1d019706d866 LLVM10 anatofuz parents: diff changeset	436 There are a few ways to do this with MLIR, including at least:
1d019706d866 LLVM10 anatofuz parents: diff changeset	437
1d019706d866 LLVM10 anatofuz parents: diff changeset	438 * We could do the same thing TensorFlow does - and we will _have_ to support
1d019706d866 LLVM10 anatofuz parents: diff changeset	439 that model to some extent for compatibility.
1d019706d866 LLVM10 anatofuz parents: diff changeset	440 * We can encode the fp range of quantized integers directly into the types
1d019706d866 LLVM10 anatofuz parents: diff changeset	441 when they are constants. The best practice on this seems to be to encode the
1d019706d866 LLVM10 anatofuz parents: diff changeset	442 zero point as well as a scale factor. This ensures that 0.0 is always
1d019706d866 LLVM10 anatofuz parents: diff changeset	443 exactly representable, e.g. `qi8<-1.42, 31.23x>`.
1d019706d866 LLVM10 anatofuz parents: diff changeset	444 * We could theoretically encode dynamically determined ranges into the types
1d019706d866 LLVM10 anatofuz parents: diff changeset	445 using something like `qi8<?,?>` with the bounds being determined through the
1d019706d866 LLVM10 anatofuz parents: diff changeset	446 SSA dataflow graph dynamically - similar to how dynamic shapes are handled.
1d019706d866 LLVM10 anatofuz parents: diff changeset	447
1d019706d866 LLVM10 anatofuz parents: diff changeset	448 We will definitely need to do #1 for compatibility, we probably want to do #2,
1d019706d866 LLVM10 anatofuz parents: diff changeset	449 and we should investigate #3 over time. That said, our short term plan is to get
1d019706d866 LLVM10 anatofuz parents: diff changeset	450 more implementation experience with the rest of the system first, then come back
1d019706d866 LLVM10 anatofuz parents: diff changeset	451 to re-examine the representation for quantized arithmetic when we have that
1d019706d866 LLVM10 anatofuz parents: diff changeset	452 experience. When we do, we should chat with benoitjacob@ and
1d019706d866 LLVM10 anatofuz parents: diff changeset	453 [read the paper](https://arxiv.org/abs/1712.05877).
1d019706d866 LLVM10 anatofuz parents: diff changeset	454
1d019706d866 LLVM10 anatofuz parents: diff changeset	455 ### Dialect type extensions
1d019706d866 LLVM10 anatofuz parents: diff changeset	456
1d019706d866 LLVM10 anatofuz parents: diff changeset	457 This section describes the design decisions that shaped the dialect extensible
1d019706d866 LLVM10 anatofuz parents: diff changeset	458 type system present in MLIR.
1d019706d866 LLVM10 anatofuz parents: diff changeset	459
1d019706d866 LLVM10 anatofuz parents: diff changeset	460 #### Reserving dialect type kinds
1d019706d866 LLVM10 anatofuz parents: diff changeset	461
1d019706d866 LLVM10 anatofuz parents: diff changeset	462 Dialects that wish to define type extensions must reserve a range of type kinds
1d019706d866 LLVM10 anatofuz parents: diff changeset	463 within a '.def' file within the core IR library. This means that every dialect
1d019706d866 LLVM10 anatofuz parents: diff changeset	464 wishing to define custom types must modify this file, but it guarantees that all
1d019706d866 LLVM10 anatofuz parents: diff changeset	465 type casting checkings are performed in O(1) time.
1d019706d866 LLVM10 anatofuz parents: diff changeset	466
1d019706d866 LLVM10 anatofuz parents: diff changeset	467 #### Interactions between dialects
1d019706d866 LLVM10 anatofuz parents: diff changeset	468
1d019706d866 LLVM10 anatofuz parents: diff changeset	469 There are two different interactions between dialects that are important to
1d019706d866 LLVM10 anatofuz parents: diff changeset	470 understand. When types of a dialect are:
1d019706d866 LLVM10 anatofuz parents: diff changeset	471
1d019706d866 LLVM10 anatofuz parents: diff changeset	472 * In operations of other dialects
1d019706d866 LLVM10 anatofuz parents: diff changeset	473
1d019706d866 LLVM10 anatofuz parents: diff changeset	474 - For standard/builtin operations, only standard/builtin types are
1d019706d866 LLVM10 anatofuz parents: diff changeset	475 allowed. This restriction allows for operations to clearly understand
1d019706d866 LLVM10 anatofuz parents: diff changeset	476 the invariants that they are working under.
1d019706d866 LLVM10 anatofuz parents: diff changeset	477 - Outside of standard/builtin operations, dialects are expected to verify
1d019706d866 LLVM10 anatofuz parents: diff changeset	478 the allowable operation types per operation.
1d019706d866 LLVM10 anatofuz parents: diff changeset	479
1d019706d866 LLVM10 anatofuz parents: diff changeset	480 * In types of other dialects
1d019706d866 LLVM10 anatofuz parents: diff changeset	481
1d019706d866 LLVM10 anatofuz parents: diff changeset	482 - For standard/builtin types, these types are allowed to contain types
1d019706d866 LLVM10 anatofuz parents: diff changeset	483 from other dialects. This simplifies the type system and removes the
1d019706d866 LLVM10 anatofuz parents: diff changeset	484 need for dialects to redefine all of the standard aggregate types, e.g.
1d019706d866 LLVM10 anatofuz parents: diff changeset	485 tensor, as well as the memref type. Dialects are expected to verify that
1d019706d866 LLVM10 anatofuz parents: diff changeset	486 a specific type is valid within a standard type, e.g. if a type can be
1d019706d866 LLVM10 anatofuz parents: diff changeset	487 an element of a tensor.
1d019706d866 LLVM10 anatofuz parents: diff changeset	488 - For dialect types, the dialect is expected to verify any type
1d019706d866 LLVM10 anatofuz parents: diff changeset	489 invariants, e.g. if the standard tensor type can contain a specific type
1d019706d866 LLVM10 anatofuz parents: diff changeset	490 of that dialect.
1d019706d866 LLVM10 anatofuz parents: diff changeset	491
1d019706d866 LLVM10 anatofuz parents: diff changeset	492 #### Separating builtin and standard types
1d019706d866 LLVM10 anatofuz parents: diff changeset	493
1d019706d866 LLVM10 anatofuz parents: diff changeset	494 Following the separation between the built-in and standard dialect, it makes
1d019706d866 LLVM10 anatofuz parents: diff changeset	495 sense to separate built-in types and standard dialect types. Built-in types are
1d019706d866 LLVM10 anatofuz parents: diff changeset	496 required for the validity of the IR itself, e.g. the function type (which
1d019706d866 LLVM10 anatofuz parents: diff changeset	497 appears in function signatures and generic assembly forms of operations).
1d019706d866 LLVM10 anatofuz parents: diff changeset	498 Integer, float, vector, memref and tensor types, while important, are not
1d019706d866 LLVM10 anatofuz parents: diff changeset	499 necessary for IR validity.
1d019706d866 LLVM10 anatofuz parents: diff changeset	500
1d019706d866 LLVM10 anatofuz parents: diff changeset	501 #### Unregistered types
1d019706d866 LLVM10 anatofuz parents: diff changeset	502
1d019706d866 LLVM10 anatofuz parents: diff changeset	503 MLIR supports unregistered operations in generic assembly form. MLIR also
1d019706d866 LLVM10 anatofuz parents: diff changeset	504 supports a similar concept for types. When parsing, if the dialect for dialect
1d019706d866 LLVM10 anatofuz parents: diff changeset	505 type has not been registered the type is modeled as an 'OpaqueType'. This allows
1d019706d866 LLVM10 anatofuz parents: diff changeset	506 for types to be round-tripped without needing to link in the dialect library
1d019706d866 LLVM10 anatofuz parents: diff changeset	507 that defined them. No additional information about opaque types, outside of
1d019706d866 LLVM10 anatofuz parents: diff changeset	508 parsing/printing, will be available.
1d019706d866 LLVM10 anatofuz parents: diff changeset	509
1d019706d866 LLVM10 anatofuz parents: diff changeset	510 #### Dialect type syntax
1d019706d866 LLVM10 anatofuz parents: diff changeset	511
1d019706d866 LLVM10 anatofuz parents: diff changeset	512 Dialect extended types are represented as string literals wrapped inside of the
1d019706d866 LLVM10 anatofuz parents: diff changeset	513 dialect namespace. This means that the parser delegates to the dialect for
1d019706d866 LLVM10 anatofuz parents: diff changeset	514 parsing specific type instances. This differs from the representation of dialect
1d019706d866 LLVM10 anatofuz parents: diff changeset	515 defined operations, of which have an identifier name that the parser uses to
1d019706d866 LLVM10 anatofuz parents: diff changeset	516 identify and parse them.
1d019706d866 LLVM10 anatofuz parents: diff changeset	517
1d019706d866 LLVM10 anatofuz parents: diff changeset	518 This representation was chosen for several reasons:
1d019706d866 LLVM10 anatofuz parents: diff changeset	519
1d019706d866 LLVM10 anatofuz parents: diff changeset	520 ##### Dialects must provide custom type parsers
1d019706d866 LLVM10 anatofuz parents: diff changeset	521
1d019706d866 LLVM10 anatofuz parents: diff changeset	522 Dialect type parsing cannot plug into the existing parser infrastructure as
1d019706d866 LLVM10 anatofuz parents: diff changeset	523 operations do with the OpAsmParser/Printer. Operations have a defined syntax
1d019706d866 LLVM10 anatofuz parents: diff changeset	524 structure that is the same across all dialects. Types, on the other hand, may
1d019706d866 LLVM10 anatofuz parents: diff changeset	525 have many different, and sometimes conflicting, parsing constraints that would
1d019706d866 LLVM10 anatofuz parents: diff changeset	526 be difficult/unmaintainable to provide within a single interface.
1d019706d866 LLVM10 anatofuz parents: diff changeset	527
1d019706d866 LLVM10 anatofuz parents: diff changeset	528 This also has the added benefit of encouraging dialects to reuse existing
1d019706d866 LLVM10 anatofuz parents: diff changeset	529 external type parsers. For example, an LLVM dialect may provide an MLIR LLVM
1d019706d866 LLVM10 anatofuz parents: diff changeset	530 type that is simply a wrapper around LLVM types. The LLVM dialect would then use
1d019706d866 LLVM10 anatofuz parents: diff changeset	531 the existing LLVM type parsing infrastructure.
1d019706d866 LLVM10 anatofuz parents: diff changeset	532
1d019706d866 LLVM10 anatofuz parents: diff changeset	533 Example:
1d019706d866 LLVM10 anatofuz parents: diff changeset	534
1d019706d866 LLVM10 anatofuz parents: diff changeset	535 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	536 %s = "foo"() : () -> !llvm<"i32*">
1d019706d866 LLVM10 anatofuz parents: diff changeset	537 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	538
1d019706d866 LLVM10 anatofuz parents: diff changeset	539 ##### Types do not always have canonical names
1d019706d866 LLVM10 anatofuz parents: diff changeset	540
1d019706d866 LLVM10 anatofuz parents: diff changeset	541 Unlike operations, types generally do not have a formal canonical name. For
1d019706d866 LLVM10 anatofuz parents: diff changeset	542 example, function types have no defined keyword and integer types are defined by
1d019706d866 LLVM10 anatofuz parents: diff changeset	543 a regular expression to support arbitrary bitwidth. Dialects with existing type
1d019706d866 LLVM10 anatofuz parents: diff changeset	544 systems, e.g. LLVM, are likely to provide wrappers around their existing type
1d019706d866 LLVM10 anatofuz parents: diff changeset	545 systems. For these wrapper types there is no simple canonical name, it's logical
1d019706d866 LLVM10 anatofuz parents: diff changeset	546 to think of these types as existing within the namespace of the dialect. If a
1d019706d866 LLVM10 anatofuz parents: diff changeset	547 dialect wishes to assign a canonical name to a type, it can be done via
1d019706d866 LLVM10 anatofuz parents: diff changeset	548 [type aliases](LangRef.md#type-aliases).
1d019706d866 LLVM10 anatofuz parents: diff changeset	549
1d019706d866 LLVM10 anatofuz parents: diff changeset	550 ### Tuple types
1d019706d866 LLVM10 anatofuz parents: diff changeset	551
1d019706d866 LLVM10 anatofuz parents: diff changeset	552 The MLIR type system provides first class support for defining
1d019706d866 LLVM10 anatofuz parents: diff changeset	553 [tuple types](LangRef.md#tuple-type). This is due to the fact that `Tuple`
1d019706d866 LLVM10 anatofuz parents: diff changeset	554 represents a universal concept that is likely to, and has already begun to,
1d019706d866 LLVM10 anatofuz parents: diff changeset	555 present itself in many different dialects. Though this type is first class in
1d019706d866 LLVM10 anatofuz parents: diff changeset	556 the type system, it merely serves to provide a common mechanism in which to
1d019706d866 LLVM10 anatofuz parents: diff changeset	557 represent this concept in MLIR. As such, MLIR provides no standard operations
1d019706d866 LLVM10 anatofuz parents: diff changeset	558 for interfacing with `tuple` types. It is up to dialect authors to provide
1d019706d866 LLVM10 anatofuz parents: diff changeset	559 operations, e.g. extract_tuple_element, to interpret and manipulate them. When
1d019706d866 LLVM10 anatofuz parents: diff changeset	560 possible, operations should prefer to use multiple results instead. These
1d019706d866 LLVM10 anatofuz parents: diff changeset	561 provide a myriad of benefits, such as alleviating any need for tuple-extract
1d019706d866 LLVM10 anatofuz parents: diff changeset	562 operations that merely get in the way of analysis and transformation.
1d019706d866 LLVM10 anatofuz parents: diff changeset	563
1d019706d866 LLVM10 anatofuz parents: diff changeset	564 ### Assembly forms
1d019706d866 LLVM10 anatofuz parents: diff changeset	565
1d019706d866 LLVM10 anatofuz parents: diff changeset	566 MLIR decides to support both generic and custom assembly forms under the
1d019706d866 LLVM10 anatofuz parents: diff changeset	567 following considerations:
1d019706d866 LLVM10 anatofuz parents: diff changeset	568
1d019706d866 LLVM10 anatofuz parents: diff changeset	569 MLIR is an open system; it is designed to support modular and pluggable
1d019706d866 LLVM10 anatofuz parents: diff changeset	570 dialects. Depending on whether there exists a corresponding dialect and whether
1d019706d866 LLVM10 anatofuz parents: diff changeset	571 the dialect is plugged in, operations may or may not be registered into MLIR
1d019706d866 LLVM10 anatofuz parents: diff changeset	572 system. Yet we still need a way to investigate these operations. So the generic
1d019706d866 LLVM10 anatofuz parents: diff changeset	573 assembly form is mandated by this aspect of MLIR system. It provides a default
1d019706d866 LLVM10 anatofuz parents: diff changeset	574 textual form for operations.
1d019706d866 LLVM10 anatofuz parents: diff changeset	575
1d019706d866 LLVM10 anatofuz parents: diff changeset	576 On the other hand, an assembly form is for assisting developers to investigate
1d019706d866 LLVM10 anatofuz parents: diff changeset	577 the IR. The generic form serves as a safe fallback but it can be too verbose for
1d019706d866 LLVM10 anatofuz parents: diff changeset	578 certain ops. Therefore, MLIR gives each dialect the choice to define a custom
1d019706d866 LLVM10 anatofuz parents: diff changeset	579 assembly form for each operation according to the operation's semantics and
1d019706d866 LLVM10 anatofuz parents: diff changeset	580 specific needs. The custom assembly form can de-duplicate information from the
1d019706d866 LLVM10 anatofuz parents: diff changeset	581 operation to derive a more concise form, thus better facilitating the
1d019706d866 LLVM10 anatofuz parents: diff changeset	582 comprehension of the IR.
1d019706d866 LLVM10 anatofuz parents: diff changeset	583
1d019706d866 LLVM10 anatofuz parents: diff changeset	584 ## Examples
1d019706d866 LLVM10 anatofuz parents: diff changeset	585
1d019706d866 LLVM10 anatofuz parents: diff changeset	586 This section describes a few very simple examples that help understand how MLIR
1d019706d866 LLVM10 anatofuz parents: diff changeset	587 represents computation.
1d019706d866 LLVM10 anatofuz parents: diff changeset	588
1d019706d866 LLVM10 anatofuz parents: diff changeset	589 ### Non-affine control flow
1d019706d866 LLVM10 anatofuz parents: diff changeset	590
1d019706d866 LLVM10 anatofuz parents: diff changeset	591 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	592 // A simple linear search in every row of a matrix
1d019706d866 LLVM10 anatofuz parents: diff changeset	593 for (i = 0; i < N; i++) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	594 for (j = 0; j < N; j++) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	595 // dynamic control flow
1d019706d866 LLVM10 anatofuz parents: diff changeset	596 if (a[i][j] == key) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	597 s[i] = j;
1d019706d866 LLVM10 anatofuz parents: diff changeset	598 break;
1d019706d866 LLVM10 anatofuz parents: diff changeset	599 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	600 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	601 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	602 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	603
1d019706d866 LLVM10 anatofuz parents: diff changeset	604 The presence of dynamic control flow leads to an inner non-affine function
1d019706d866 LLVM10 anatofuz parents: diff changeset	605 nested in an outer function that using affine loops.
1d019706d866 LLVM10 anatofuz parents: diff changeset	606
1d019706d866 LLVM10 anatofuz parents: diff changeset	607 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	608 func @search(%A: memref<?x?xi32, %S: <?xi32>, %key : i32) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	609 %ni = dim %A, 0 : memref<?x?xi32>
1d019706d866 LLVM10 anatofuz parents: diff changeset	610 // This loop can be parallelized
1d019706d866 LLVM10 anatofuz parents: diff changeset	611 affine.for %i = 0 to %ni {
1d019706d866 LLVM10 anatofuz parents: diff changeset	612 call @search_body (%A, %S, %key, %i) : (memref<?x?xi32>, memref<?xi32>, i32, i32)
1d019706d866 LLVM10 anatofuz parents: diff changeset	613 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	614 return
1d019706d866 LLVM10 anatofuz parents: diff changeset	615 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	616
1d019706d866 LLVM10 anatofuz parents: diff changeset	617 func @search_body(%A: memref<?x?xi32>, %S: memref<?xi32>, %key: i32, %i : i32) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	618 %nj = dim %A, 1 : memref<?x?xi32>
1d019706d866 LLVM10 anatofuz parents: diff changeset	619 br ^bb1(0)
1d019706d866 LLVM10 anatofuz parents: diff changeset	620
1d019706d866 LLVM10 anatofuz parents: diff changeset	621 ^bb1(%j: i32)
1d019706d866 LLVM10 anatofuz parents: diff changeset	622 %p1 = cmpi "lt", %j, %nj : i32
1d019706d866 LLVM10 anatofuz parents: diff changeset	623 cond_br %p1, ^bb2, ^bb5
1d019706d866 LLVM10 anatofuz parents: diff changeset	624
1d019706d866 LLVM10 anatofuz parents: diff changeset	625 ^bb2:
1d019706d866 LLVM10 anatofuz parents: diff changeset	626 %v = affine.load %A[%i, %j] : memref<?x?xi32>
1d019706d866 LLVM10 anatofuz parents: diff changeset	627 %p2 = cmpi "eq", %v, %key : i32
1d019706d866 LLVM10 anatofuz parents: diff changeset	628 cond_br %p2, ^bb3(%j), ^bb4
1d019706d866 LLVM10 anatofuz parents: diff changeset	629
1d019706d866 LLVM10 anatofuz parents: diff changeset	630 ^bb3(%j: i32)
1d019706d866 LLVM10 anatofuz parents: diff changeset	631 affine.store %j, %S[%i] : memref<?xi32>
1d019706d866 LLVM10 anatofuz parents: diff changeset	632 br ^bb5
1d019706d866 LLVM10 anatofuz parents: diff changeset	633
1d019706d866 LLVM10 anatofuz parents: diff changeset	634 ^bb4:
1d019706d866 LLVM10 anatofuz parents: diff changeset	635 %jinc = addi %j, 1 : i32
1d019706d866 LLVM10 anatofuz parents: diff changeset	636 br ^bb1(%jinc)
1d019706d866 LLVM10 anatofuz parents: diff changeset	637
1d019706d866 LLVM10 anatofuz parents: diff changeset	638 ^bb5:
1d019706d866 LLVM10 anatofuz parents: diff changeset	639 return
1d019706d866 LLVM10 anatofuz parents: diff changeset	640 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	641 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	642
1d019706d866 LLVM10 anatofuz parents: diff changeset	643 As per the [MLIR spec](LangRef.md), the restrictions on dimensions and symbol
1d019706d866 LLVM10 anatofuz parents: diff changeset	644 identifiers to be used with the affine.apply operation only apply to accesses
1d019706d866 LLVM10 anatofuz parents: diff changeset	645 inside `affine.for` and `affine.if` operations. However, an analysis of accesses
1d019706d866 LLVM10 anatofuz parents: diff changeset	646 inside the called function (`@search_body`) is necessary to determine if the
1d019706d866 LLVM10 anatofuz parents: diff changeset	647 `%i` loop could be parallelized: such function access analysis is calling
1d019706d866 LLVM10 anatofuz parents: diff changeset	648 context sensitive.
1d019706d866 LLVM10 anatofuz parents: diff changeset	649
1d019706d866 LLVM10 anatofuz parents: diff changeset	650 ### Non-affine loop bounds
1d019706d866 LLVM10 anatofuz parents: diff changeset	651
1d019706d866 LLVM10 anatofuz parents: diff changeset	652 Loop bounds that are not affine lead to a nesting of functions as shown below.
1d019706d866 LLVM10 anatofuz parents: diff changeset	653
1d019706d866 LLVM10 anatofuz parents: diff changeset	654 ```c
1d019706d866 LLVM10 anatofuz parents: diff changeset	655 for (i = 0; i < N; i++)
1d019706d866 LLVM10 anatofuz parents: diff changeset	656 for (j = 0; j < N; j++)
1d019706d866 LLVM10 anatofuz parents: diff changeset	657 // Non-affine loop bound for k loop.
1d019706d866 LLVM10 anatofuz parents: diff changeset	658 for (k = 0; k < pow(2, j); k++)
1d019706d866 LLVM10 anatofuz parents: diff changeset	659 for (l = 0; l < N; l++) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	660 // block loop body
1d019706d866 LLVM10 anatofuz parents: diff changeset	661 ...
1d019706d866 LLVM10 anatofuz parents: diff changeset	662 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	663 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	664
1d019706d866 LLVM10 anatofuz parents: diff changeset	665 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	666 func @outer_nest(%n : index) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	667 affine.for %i = 0 to %n {
1d019706d866 LLVM10 anatofuz parents: diff changeset	668 affine.for %j = 0 to %n {
1d019706d866 LLVM10 anatofuz parents: diff changeset	669 %pow = call @pow(2, %j) : (index, index) -> index
1d019706d866 LLVM10 anatofuz parents: diff changeset	670 call @inner_nest(%pow, %n) : ...
1d019706d866 LLVM10 anatofuz parents: diff changeset	671 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	672 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	673 return
1d019706d866 LLVM10 anatofuz parents: diff changeset	674 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	675
1d019706d866 LLVM10 anatofuz parents: diff changeset	676 func @inner_nest(%m : index, %n : index) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	677 affine.for %k = 0 to %m {
1d019706d866 LLVM10 anatofuz parents: diff changeset	678 affine.for %l = 0 to %n {
1d019706d866 LLVM10 anatofuz parents: diff changeset	679 ...
1d019706d866 LLVM10 anatofuz parents: diff changeset	680 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	681 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	682 return
1d019706d866 LLVM10 anatofuz parents: diff changeset	683 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	684 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	685
1d019706d866 LLVM10 anatofuz parents: diff changeset	686 ### Reference 2D Convolution
1d019706d866 LLVM10 anatofuz parents: diff changeset	687
1d019706d866 LLVM10 anatofuz parents: diff changeset	688 The following example illustrates a reference implementation of a 2D
1d019706d866 LLVM10 anatofuz parents: diff changeset	689 convolution, which uses an integer set `#domain` to represent valid input data
1d019706d866 LLVM10 anatofuz parents: diff changeset	690 in a dilated convolution.
1d019706d866 LLVM10 anatofuz parents: diff changeset	691
1d019706d866 LLVM10 anatofuz parents: diff changeset	692 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	693 // Dilation factors S0 and S1 can be constant folded if constant at compile time.
1d019706d866 LLVM10 anatofuz parents: diff changeset	694 #domain = (d0, d1)[S0,S1,S2,S3]: (d0 % S0 == 0, d1 % S1 == 0, d0 >= 0, d1 >= 0,
1d019706d866 LLVM10 anatofuz parents: diff changeset	695 S3 - d0 - 1 >= 0, S4 - d1 - 1 >= 0)
1d019706d866 LLVM10 anatofuz parents: diff changeset	696 // Identity map (shown here for illustration).
1d019706d866 LLVM10 anatofuz parents: diff changeset	697 #map0 = (d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3, d4, d5, d6)
1d019706d866 LLVM10 anatofuz parents: diff changeset	698
1d019706d866 LLVM10 anatofuz parents: diff changeset	699 // Affine map from output to input coordinate space.
1d019706d866 LLVM10 anatofuz parents: diff changeset	700 // d0 = output_h, d1 = output_w, d2 = kernel_h, d3 = kernel_w
1d019706d866 LLVM10 anatofuz parents: diff changeset	701 // S0 = h_stride, S1 = w_stride, S2 = h_kernel_dilation, S3 = w_kernel_dilation
1d019706d866 LLVM10 anatofuz parents: diff changeset	702 // S4 = h_pad_low, S5 = w_pad_low
1d019706d866 LLVM10 anatofuz parents: diff changeset	703 // %out0 = %0#1 * %h_stride + %0#4 * %h_kernel_dilation - %h_pad_low
1d019706d866 LLVM10 anatofuz parents: diff changeset	704 // %out1= %0#2 * %w_stride + %0#5 * %w_kernel_dilation - %w_pad_low
1d019706d866 LLVM10 anatofuz parents: diff changeset	705 #map1_0 = (d0, d1, d2, d3) [S0, S1, S2, S3, S4, S5] -> (d0 * S0 + d2 * S2 - %S4)
1d019706d866 LLVM10 anatofuz parents: diff changeset	706 #map1_1 = (d0, d1, d2, d3) [S0, S1, S2, S3, S4, S5] -> (d1 * S1 + d3 * S3 - %S5)
1d019706d866 LLVM10 anatofuz parents: diff changeset	707
1d019706d866 LLVM10 anatofuz parents: diff changeset	708 // Semi-affine map to undilated input coordinate space.
1d019706d866 LLVM10 anatofuz parents: diff changeset	709 // d0 = input_h, d1 = input_w, S0 = h_base_dilation, S1 = w_base_dilation.
1d019706d866 LLVM10 anatofuz parents: diff changeset	710 #map2_0 = (d0, d1) [S0, S1] -> (d0 / S0)
1d019706d866 LLVM10 anatofuz parents: diff changeset	711 #map2_1 = (d0, d1) [S0, S1] -> (d1 / S1)
1d019706d866 LLVM10 anatofuz parents: diff changeset	712
1d019706d866 LLVM10 anatofuz parents: diff changeset	713 // Conv2D shapes:
1d019706d866 LLVM10 anatofuz parents: diff changeset	714 // input: [batch, input_height, input_width, input_feature]
1d019706d866 LLVM10 anatofuz parents: diff changeset	715 // kernel: [kernel_height, kernel_width, input_feature, output_feature]
1d019706d866 LLVM10 anatofuz parents: diff changeset	716 // output: [batch, output_height, output_width, output_feature]
1d019706d866 LLVM10 anatofuz parents: diff changeset	717 func @conv2d(%input: memref<16x1024x1024x3xf32, #lm0, /scratchpad=/1>,
1d019706d866 LLVM10 anatofuz parents: diff changeset	718 %kernel: memref<5x5x3x32xf32, #lm0, /scratchpad=/1>,
1d019706d866 LLVM10 anatofuz parents: diff changeset	719 %output: memref<16x512x512x32xf32, #lm0, /scratchpad=/1>) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	720 affine.for %b = 0 to %batch {
1d019706d866 LLVM10 anatofuz parents: diff changeset	721 affine.for %oh = 0 to %output_height {
1d019706d866 LLVM10 anatofuz parents: diff changeset	722 affine.for %ow = 0 to %output_width {
1d019706d866 LLVM10 anatofuz parents: diff changeset	723 affine.for %of = 0 to %output_feature {
1d019706d866 LLVM10 anatofuz parents: diff changeset	724 affine.for %kh = 0 to %kernel_height {
1d019706d866 LLVM10 anatofuz parents: diff changeset	725 affine.for %kw = 0 to %kernel_width {
1d019706d866 LLVM10 anatofuz parents: diff changeset	726 affine.for %if = 0 to %input_feature {
1d019706d866 LLVM10 anatofuz parents: diff changeset	727 // Calculate input indices.
1d019706d866 LLVM10 anatofuz parents: diff changeset	728 %1_0 = affine.apply #map1_0 (%0#1, %0#2, %0#4, %0#5)
1d019706d866 LLVM10 anatofuz parents: diff changeset	729 [%h_stride, %w_stride, %h_kernel_dilation, %w_kernel_dilation,
1d019706d866 LLVM10 anatofuz parents: diff changeset	730 %h_pad_low, %w_pad_low]
1d019706d866 LLVM10 anatofuz parents: diff changeset	731 %1_1 = affine.apply #map1_1 (%0#1, %0#2, %0#4, %0#5)
1d019706d866 LLVM10 anatofuz parents: diff changeset	732 [%h_stride, %w_stride, %h_kernel_dilation, %w_kernel_dilation,
1d019706d866 LLVM10 anatofuz parents: diff changeset	733 %h_pad_low, %w_pad_low]
1d019706d866 LLVM10 anatofuz parents: diff changeset	734
1d019706d866 LLVM10 anatofuz parents: diff changeset	735 // Check if access is not in padding.
1d019706d866 LLVM10 anatofuz parents: diff changeset	736 affine.if #domain(%1_0, %1_1)
1d019706d866 LLVM10 anatofuz parents: diff changeset	737 [%h_base_dilation, %w_kernel_dilation, %h_bound, %w_bound] {
1d019706d866 LLVM10 anatofuz parents: diff changeset	738 %2_0 = affine.apply #map2 (%1_0, %1_1)
1d019706d866 LLVM10 anatofuz parents: diff changeset	739 %2_1 = affine.apply #map2 (%1_0, %1_1)
1d019706d866 LLVM10 anatofuz parents: diff changeset	740 // Compute: output[output_indices] += input[input_indices] * kernel[kernel_indices]
1d019706d866 LLVM10 anatofuz parents: diff changeset	741 call @multiply_accumulate(%input, %kernel, %output, %b, %oh, %ow, %of, %kh, %kw, %if, %2_0, %2_1)
1d019706d866 LLVM10 anatofuz parents: diff changeset	742 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	743 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	744 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	745 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	746 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	747 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	748 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	749 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	750 return
1d019706d866 LLVM10 anatofuz parents: diff changeset	751 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	752 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	753
1d019706d866 LLVM10 anatofuz parents: diff changeset	754 TODO (Add more examples showing the IR for a variety of interesting cases)
1d019706d866 LLVM10 anatofuz parents: diff changeset	755
1d019706d866 LLVM10 anatofuz parents: diff changeset	756 ## Design alternatives and extensions
1d019706d866 LLVM10 anatofuz parents: diff changeset	757
1d019706d866 LLVM10 anatofuz parents: diff changeset	758 This is a list of some design alternatives and extensions that we discussed in
1d019706d866 LLVM10 anatofuz parents: diff changeset	759 detail but did not include in the spec or postponed them for future
1d019706d866 LLVM10 anatofuz parents: diff changeset	760 consideration on demand. We will revisit these discussions when we have more
1d019706d866 LLVM10 anatofuz parents: diff changeset	761 implementation experience and learn more about the challenges and limitations of
1d019706d866 LLVM10 anatofuz parents: diff changeset	762 our current design in practice.
1d019706d866 LLVM10 anatofuz parents: diff changeset	763
1d019706d866 LLVM10 anatofuz parents: diff changeset	764 ### Polyhedral code representation alternatives: schedule lists vs schedules trees vs affine loop/if forms
1d019706d866 LLVM10 anatofuz parents: diff changeset	765
1d019706d866 LLVM10 anatofuz parents: diff changeset	766 The current MLIR uses a representation of polyhedral schedules using a tree of
1d019706d866 LLVM10 anatofuz parents: diff changeset	767 if/for loops. We extensively debated the tradeoffs involved in the typical
1d019706d866 LLVM10 anatofuz parents: diff changeset	768 unordered polyhedral instruction representation (where each instruction has
1d019706d866 LLVM10 anatofuz parents: diff changeset	769 multidimensional schedule information), discussed the benefits of schedule tree
1d019706d866 LLVM10 anatofuz parents: diff changeset	770 forms, and eventually decided to go with a syntactic tree of affine if/else
1d019706d866 LLVM10 anatofuz parents: diff changeset	771 conditionals and affine for loops. Discussion of the tradeoff was captured in
1d019706d866 LLVM10 anatofuz parents: diff changeset	772 this document:
1d019706d866 LLVM10 anatofuz parents: diff changeset	773 [ MLIR: The case for a simplified polyhedral form](RationaleSimplifiedPolyhedralForm.md).
1d019706d866 LLVM10 anatofuz parents: diff changeset	774
1d019706d866 LLVM10 anatofuz parents: diff changeset	775 At a high level, we have two alternatives here:
1d019706d866 LLVM10 anatofuz parents: diff changeset	776
1d019706d866 LLVM10 anatofuz parents: diff changeset	777 1. Schedule tree representation instead of an affine loop AST form: The current
1d019706d866 LLVM10 anatofuz parents: diff changeset	778 proposal uses an affine loop and conditional tree form, which is syntactic
1d019706d866 LLVM10 anatofuz parents: diff changeset	779 and with no separation of domains as sets and schedules as multidimensional
1d019706d866 LLVM10 anatofuz parents: diff changeset	780 affine functions. A schedule tree form however makes polyhedral domains and
1d019706d866 LLVM10 anatofuz parents: diff changeset	781 schedules a first class concept in the IR allowing compact expression of
1d019706d866 LLVM10 anatofuz parents: diff changeset	782 transformations through the schedule tree without changing the domains of
1d019706d866 LLVM10 anatofuz parents: diff changeset	783 instructions. Such a representation also hides prologues, epilogues, partial
1d019706d866 LLVM10 anatofuz parents: diff changeset	784 tiles, complex loop bounds and conditionals making loop nests free of
1d019706d866 LLVM10 anatofuz parents: diff changeset	785 "syntax". Cost models instead look at domains and schedules. In addition, if
1d019706d866 LLVM10 anatofuz parents: diff changeset	786 necessary such a domain schedule representation can be normalized to
1d019706d866 LLVM10 anatofuz parents: diff changeset	787 explicitly propagate the schedule into domains and model all the cleanup
1d019706d866 LLVM10 anatofuz parents: diff changeset	788 code. An example and more detail on the schedule tree form is in the next
1d019706d866 LLVM10 anatofuz parents: diff changeset	789 section.
1d019706d866 LLVM10 anatofuz parents: diff changeset	790 1. Having two different forms of "affine regions": an affine loop tree form
1d019706d866 LLVM10 anatofuz parents: diff changeset	791 and a polyhedral schedule tree form. In the latter, ops could carry
1d019706d866 LLVM10 anatofuz parents: diff changeset	792 attributes capturing domain, scheduling, and other polyhedral code
1d019706d866 LLVM10 anatofuz parents: diff changeset	793 generation options with IntegerSet, AffineMap, and other attributes.
1d019706d866 LLVM10 anatofuz parents: diff changeset	794
1d019706d866 LLVM10 anatofuz parents: diff changeset	795 #### Schedule Tree Representation for Affine Regions
1d019706d866 LLVM10 anatofuz parents: diff changeset	796
1d019706d866 LLVM10 anatofuz parents: diff changeset	797 This representation is based on a simplified form of the domain/schedule
1d019706d866 LLVM10 anatofuz parents: diff changeset	798 representation used by the polyhedral compiler community. Domains represent what
1d019706d866 LLVM10 anatofuz parents: diff changeset	799 has to be executed while schedules represent the order in which domain elements
1d019706d866 LLVM10 anatofuz parents: diff changeset	800 are interleaved. We model domains as non-piece-wise convex integer sets, and
1d019706d866 LLVM10 anatofuz parents: diff changeset	801 schedules as affine functions; however, the former can be disjunctive, and the
1d019706d866 LLVM10 anatofuz parents: diff changeset	802 latter can be piece-wise affine relations. In the schedule tree representation,
1d019706d866 LLVM10 anatofuz parents: diff changeset	803 domain and schedules for instructions are represented in a tree-like structure
1d019706d866 LLVM10 anatofuz parents: diff changeset	804 which is called a schedule tree. Each non-leaf node of the tree is an abstract
1d019706d866 LLVM10 anatofuz parents: diff changeset	805 polyhedral dimension corresponding to an abstract fused loop for each ML
1d019706d866 LLVM10 anatofuz parents: diff changeset	806 instruction that appears in that branch. Each leaf node is an ML Instruction.
1d019706d866 LLVM10 anatofuz parents: diff changeset	807
1d019706d866 LLVM10 anatofuz parents: diff changeset	808 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	809 // A tiled matmul code (128x128x128) represented in schedule tree form
1d019706d866 LLVM10 anatofuz parents: diff changeset	810
1d019706d866 LLVM10 anatofuz parents: diff changeset	811 // #map0 = (d0, d1, d2, d3, d4, d5) -> (128d0 + d3, 128d1 + d4, 128*d2 + d5)
1d019706d866 LLVM10 anatofuz parents: diff changeset	812 #intset_ij = (i, j) [M, N, K] : i >= 0, -i + N - 1 >= 0, j >= 0, -j + N-1 >= 0
1d019706d866 LLVM10 anatofuz parents: diff changeset	813 #intset_ijk = (i, j, k) [M, N, K] : i >= 0, -i + N - 1 >= 0, j >= 0,
1d019706d866 LLVM10 anatofuz parents: diff changeset	814 -j + M-1 >= 0, k >= 0, -k + N - 1 >= 0)
1d019706d866 LLVM10 anatofuz parents: diff changeset	815 func @matmul(%A, %B, %C, %M, %N, %K) : (...) { // %M, N, K are symbols
1d019706d866 LLVM10 anatofuz parents: diff changeset	816 // t1, t2, t3, t4, t5, t6 are abstract polyhedral loops
1d019706d866 LLVM10 anatofuz parents: diff changeset	817 mldim %t1 : {S1,S2,S3,S4,S5} floordiv (i, 128) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	818 mldim %t2 : {S1,S2,S3,S4,S5} floordiv (j, 128) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	819 // (%i, %j) = affine.apply (d0, d1) -> (128d0, 128d1) (%t1, %t2)
1d019706d866 LLVM10 anatofuz parents: diff changeset	820 call dma_mem_to_scratchpad(%C, %i, %j, %M, %N, %K)
1d019706d866 LLVM10 anatofuz parents: diff changeset	821 with @intset_ij(%i, %j) [%M, %N, %K]
1d019706d866 LLVM10 anatofuz parents: diff changeset	822 mldim %t3 : {S2,S3,S4,S5} floordiv (k, 128) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	823 // (%i, %j, %k) = affine.apply (d0, d1, d2)
1d019706d866 LLVM10 anatofuz parents: diff changeset	824 // -> (128d0, 128d1, 128*d2) (%t1, %t2, %t3)
1d019706d866 LLVM10 anatofuz parents: diff changeset	825 call dma_mem_to_scratchpad(%A, ...) with #inset_ijk (%i, %j, %k) [%M, %N, %K]
1d019706d866 LLVM10 anatofuz parents: diff changeset	826 // (%i, %j, %k) = affine.apply (d0, d1, d2)
1d019706d866 LLVM10 anatofuz parents: diff changeset	827 // -> (128d0, 128d1, 128*d2) (%t1, %t2, %t3)
1d019706d866 LLVM10 anatofuz parents: diff changeset	828 call dma_mem_to_scratchpad(%B, ...) with #inset_ijk (%i, %j, %k) [%M, %N, %K]
1d019706d866 LLVM10 anatofuz parents: diff changeset	829 mldim %t4 : {S4} i mod 128 {
1d019706d866 LLVM10 anatofuz parents: diff changeset	830 mldim %t5 : {S4} j mod 128 {
1d019706d866 LLVM10 anatofuz parents: diff changeset	831 mldim %t6 : {S4} k mod 128 {
1d019706d866 LLVM10 anatofuz parents: diff changeset	832 // (%i, %j, %k) = affine.apply #map0 (%t1, %t2, %t3, %t4, %t5, %t6)
1d019706d866 LLVM10 anatofuz parents: diff changeset	833 call matmul_body(A, B, C, %i, %j, %k, %M, %N, %K)
1d019706d866 LLVM10 anatofuz parents: diff changeset	834 with #inset_ijk(%i, %j, %k) [%M, %N, %K]
1d019706d866 LLVM10 anatofuz parents: diff changeset	835 } // end mld4im t6
1d019706d866 LLVM10 anatofuz parents: diff changeset	836 } // end mldim t5
1d019706d866 LLVM10 anatofuz parents: diff changeset	837 } // end mldim t4
1d019706d866 LLVM10 anatofuz parents: diff changeset	838 } // end mldim t3
1d019706d866 LLVM10 anatofuz parents: diff changeset	839 // (%i, %j) = affine.apply (d0, d1) -> (128d0, 128d1) (%t1, %t2)
1d019706d866 LLVM10 anatofuz parents: diff changeset	840 call $dma_scratchpad_to_mem_C ... with #intset(%i, %j) [%M, %N, %K]
1d019706d866 LLVM10 anatofuz parents: diff changeset	841 } // end mldim t2
1d019706d866 LLVM10 anatofuz parents: diff changeset	842 } // end mldim t1
1d019706d866 LLVM10 anatofuz parents: diff changeset	843 return
1d019706d866 LLVM10 anatofuz parents: diff changeset	844 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	845
1d019706d866 LLVM10 anatofuz parents: diff changeset	846 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	847
1d019706d866 LLVM10 anatofuz parents: diff changeset	848 ### Affine Relations
1d019706d866 LLVM10 anatofuz parents: diff changeset	849
1d019706d866 LLVM10 anatofuz parents: diff changeset	850 The current MLIR spec includes affine maps and integer sets, but not affine
1d019706d866 LLVM10 anatofuz parents: diff changeset	851 relations. Affine relations are a natural way to model read and write access
1d019706d866 LLVM10 anatofuz parents: diff changeset	852 information, which can be very useful to capture the behavior of opaque external
1d019706d866 LLVM10 anatofuz parents: diff changeset	853 library calls, high-performance vendor libraries, or user-provided / user-tuned
1d019706d866 LLVM10 anatofuz parents: diff changeset	854 routines.
1d019706d866 LLVM10 anatofuz parents: diff changeset	855
1d019706d866 LLVM10 anatofuz parents: diff changeset	856 An affine relation is a relation between input and output dimension identifiers
1d019706d866 LLVM10 anatofuz parents: diff changeset	857 while being symbolic on a list of symbolic identifiers and with affine
1d019706d866 LLVM10 anatofuz parents: diff changeset	858 constraints on the identifiers.
1d019706d866 LLVM10 anatofuz parents: diff changeset	859
1d019706d866 LLVM10 anatofuz parents: diff changeset	860 Syntax:
1d019706d866 LLVM10 anatofuz parents: diff changeset	861
1d019706d866 LLVM10 anatofuz parents: diff changeset	862 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	863 // Affine relation definition at the top of file
1d019706d866 LLVM10 anatofuz parents: diff changeset	864 affine-rel-def ::= affine-rel-id `=` affine-relation-inline
1d019706d866 LLVM10 anatofuz parents: diff changeset	865
1d019706d866 LLVM10 anatofuz parents: diff changeset	866 affine-rel-id ::= `##` prefixed-id
1d019706d866 LLVM10 anatofuz parents: diff changeset	867
1d019706d866 LLVM10 anatofuz parents: diff changeset	868 affine-relation-inline ::=
1d019706d866 LLVM10 anatofuz parents: diff changeset	869 `(` input-dims `)` (`[` symbols `]`)? `->`
1d019706d866 LLVM10 anatofuz parents: diff changeset	870 `(` output-dims `)` : affine-constraint-conjunction
1d019706d866 LLVM10 anatofuz parents: diff changeset	871
1d019706d866 LLVM10 anatofuz parents: diff changeset	872 input-dims ::= bare-id-list
1d019706d866 LLVM10 anatofuz parents: diff changeset	873 output-dims ::= bare-id-list
1d019706d866 LLVM10 anatofuz parents: diff changeset	874 symbols ::= bare-id-list
1d019706d866 LLVM10 anatofuz parents: diff changeset	875
1d019706d866 LLVM10 anatofuz parents: diff changeset	876 affine-rel ::= affine-rel-id \| affine-relation-inline
1d019706d866 LLVM10 anatofuz parents: diff changeset	877
1d019706d866 LLVM10 anatofuz parents: diff changeset	878 // Usage
1d019706d866 LLVM10 anatofuz parents: diff changeset	879 affine-rel-spec ::= affine-rel dim-and-symbol-use-list
1d019706d866 LLVM10 anatofuz parents: diff changeset	880 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	881
1d019706d866 LLVM10 anatofuz parents: diff changeset	882 All identifiers appearing in input-dims, output-dims, and symbol-dims are
1d019706d866 LLVM10 anatofuz parents: diff changeset	883 pairwise distinct. All affine-constraint non-terminals in the above syntax are
1d019706d866 LLVM10 anatofuz parents: diff changeset	884 allowed to contain identifiers only from input-dims, output-dims, and
1d019706d866 LLVM10 anatofuz parents: diff changeset	885 symbol-dims.
1d019706d866 LLVM10 anatofuz parents: diff changeset	886
1d019706d866 LLVM10 anatofuz parents: diff changeset	887 Affine relations are used to model read, write, may_read, and may_write sets of
1d019706d866 LLVM10 anatofuz parents: diff changeset	888 functions in the IR. The output dimension identifiers correspond to the data
1d019706d866 LLVM10 anatofuz parents: diff changeset	889 dimensions.
1d019706d866 LLVM10 anatofuz parents: diff changeset	890
1d019706d866 LLVM10 anatofuz parents: diff changeset	891 Example:
1d019706d866 LLVM10 anatofuz parents: diff changeset	892
1d019706d866 LLVM10 anatofuz parents: diff changeset	893 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	894 // read relation: two elements ( d0 <= r0 <= d0+1 )
1d019706d866 LLVM10 anatofuz parents: diff changeset	895 ##aff_rel9 = (d0) -> (r0) : r0 - d0 >= 0, d0 - r0 + 1 >= 0
1d019706d866 LLVM10 anatofuz parents: diff changeset	896
1d019706d866 LLVM10 anatofuz parents: diff changeset	897 func @count (%A : memref<128xf32>, %pos : i32) -> f32
1d019706d866 LLVM10 anatofuz parents: diff changeset	898 reads: {%A ##aff_rel9 (%pos)}
1d019706d866 LLVM10 anatofuz parents: diff changeset	899 writes: /* empty */
1d019706d866 LLVM10 anatofuz parents: diff changeset	900 may_reads: /* empty */
1d019706d866 LLVM10 anatofuz parents: diff changeset	901 may_writes: /* empty */ {
1d019706d866 LLVM10 anatofuz parents: diff changeset	902 bb0 (%0, %1: memref<128xf32>, i64):
1d019706d866 LLVM10 anatofuz parents: diff changeset	903 %val = affine.load %A [%pos]
1d019706d866 LLVM10 anatofuz parents: diff changeset	904 %val = affine.load %A [%pos + 1]
1d019706d866 LLVM10 anatofuz parents: diff changeset	905 %p = mulf %val, %val : f32
1d019706d866 LLVM10 anatofuz parents: diff changeset	906 return %p : f32
1d019706d866 LLVM10 anatofuz parents: diff changeset	907 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	908 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	909
1d019706d866 LLVM10 anatofuz parents: diff changeset	910 ### Regions
1d019706d866 LLVM10 anatofuz parents: diff changeset	911
1d019706d866 LLVM10 anatofuz parents: diff changeset	912 #### Making function definition an operation
1d019706d866 LLVM10 anatofuz parents: diff changeset	913
1d019706d866 LLVM10 anatofuz parents: diff changeset	914 MLIR supports values of a Function type. Instead of having first-class IR
1d019706d866 LLVM10 anatofuz parents: diff changeset	915 concept for functions, one could define an operation with a body region that
1d019706d866 LLVM10 anatofuz parents: diff changeset	916 defines a function value. The particularity of functions is that their names are
1d019706d866 LLVM10 anatofuz parents: diff changeset	917 globally visible and can be referred to before being defined, unlike SSA values
1d019706d866 LLVM10 anatofuz parents: diff changeset	918 that must be defined first. Implementing a "function definition" operation would
1d019706d866 LLVM10 anatofuz parents: diff changeset	919 require to relax some of the SSA constraints in a region, and also make the IR
1d019706d866 LLVM10 anatofuz parents: diff changeset	920 Module a region as well. It would also affect the core infrastructure (e.g.,
1d019706d866 LLVM10 anatofuz parents: diff changeset	921 function passes) only for the sake of concept unification.
1d019706d866 LLVM10 anatofuz parents: diff changeset	922
1d019706d866 LLVM10 anatofuz parents: diff changeset	923 #### Having types on a region
1d019706d866 LLVM10 anatofuz parents: diff changeset	924
1d019706d866 LLVM10 anatofuz parents: diff changeset	925 Instead of inspecting the types of arguments of the first block, one could give
1d019706d866 LLVM10 anatofuz parents: diff changeset	926 the region itself a type. This type would be redundant with block argument
1d019706d866 LLVM10 anatofuz parents: diff changeset	927 types, which must have values and create room for type mismatches. While
1d019706d866 LLVM10 anatofuz parents: diff changeset	928 functions do have types that are partly redundant with the arguments of the
1d019706d866 LLVM10 anatofuz parents: diff changeset	929 first block in the function, this is necessary to support function declarations
1d019706d866 LLVM10 anatofuz parents: diff changeset	930 that do not have a body which we can refer to in order to obtain the argument
1d019706d866 LLVM10 anatofuz parents: diff changeset	931 types. A region is always contained in an operation or a function that can be
1d019706d866 LLVM10 anatofuz parents: diff changeset	932 queried to obtain the “type” of the region if necessary.
1d019706d866 LLVM10 anatofuz parents: diff changeset	933
1d019706d866 LLVM10 anatofuz parents: diff changeset	934 A type on a region can be justified if Regions were to be considered separately
1d019706d866 LLVM10 anatofuz parents: diff changeset	935 from the enclosing entity (operation or function) and had their own semantics
1d019706d866 LLVM10 anatofuz parents: diff changeset	936 that should be checked.
1d019706d866 LLVM10 anatofuz parents: diff changeset	937
1d019706d866 LLVM10 anatofuz parents: diff changeset	938 #### Attaching attributes to regions
1d019706d866 LLVM10 anatofuz parents: diff changeset	939
1d019706d866 LLVM10 anatofuz parents: diff changeset	940 Regions could be annotated with dialect attributes to use attribute verification
1d019706d866 LLVM10 anatofuz parents: diff changeset	941 hooks. An operation could take multiple regions as arguments, and each of them
1d019706d866 LLVM10 anatofuz parents: diff changeset	942 may require different attributes. However, there are currently very few
1d019706d866 LLVM10 anatofuz parents: diff changeset	943 practical cases where this would be necessary. Instead, one could simulate
1d019706d866 LLVM10 anatofuz parents: diff changeset	944 per-region attributes with array attributes attached to the entity containing
1d019706d866 LLVM10 anatofuz parents: diff changeset	945 the region (operation or function). This decreases the overall complexity of the
1d019706d866 LLVM10 anatofuz parents: diff changeset	946 IR and enables more concise and op-specific forms, e.g., when all regions of an
1d019706d866 LLVM10 anatofuz parents: diff changeset	947 op have the same attribute that can be only mentioned once. Since the semantics
1d019706d866 LLVM10 anatofuz parents: diff changeset	948 of the region is entirely defined by the enclosing entity, it also makes sense
1d019706d866 LLVM10 anatofuz parents: diff changeset	949 to have attributes attached to that entity rather than to the region itself.
1d019706d866 LLVM10 anatofuz parents: diff changeset	950
1d019706d866 LLVM10 anatofuz parents: diff changeset	951 This can be reconsidered in the future if we see a non-neglectable amount of use
1d019706d866 LLVM10 anatofuz parents: diff changeset	952 cases.
1d019706d866 LLVM10 anatofuz parents: diff changeset	953
1d019706d866 LLVM10 anatofuz parents: diff changeset	954 ### Read/Write/May_Read/May_Write sets for External Functions
1d019706d866 LLVM10 anatofuz parents: diff changeset	955
1d019706d866 LLVM10 anatofuz parents: diff changeset	956 Having read, write, may_read, and may_write sets for external functions which
1d019706d866 LLVM10 anatofuz parents: diff changeset	957 include opaque ones, high-performance vendor libraries such as CuDNN, CuB, MKL,
1d019706d866 LLVM10 anatofuz parents: diff changeset	958 FFT libraries, user-provided/optimized functions, or data movement runtimes such
1d019706d866 LLVM10 anatofuz parents: diff changeset	959 as DMA ones is a powerful feature. It allows the compiler to perform analysis,
1d019706d866 LLVM10 anatofuz parents: diff changeset	960 composition/transformation in the presence of such calls and with loops around
1d019706d866 LLVM10 anatofuz parents: diff changeset	961 such calls on sub-tensors. For user-provided or custom hand-tuned functions, the
1d019706d866 LLVM10 anatofuz parents: diff changeset	962 read/write/may_read/may_write sets could be provided a-priori by a user as part
1d019706d866 LLVM10 anatofuz parents: diff changeset	963 of the external function signature or they could be part of a database.
1d019706d866 LLVM10 anatofuz parents: diff changeset	964
1d019706d866 LLVM10 anatofuz parents: diff changeset	965 TODO: Design this, and update to use function attribute syntax.
1d019706d866 LLVM10 anatofuz parents: diff changeset	966
1d019706d866 LLVM10 anatofuz parents: diff changeset	967 Example:
1d019706d866 LLVM10 anatofuz parents: diff changeset	968
1d019706d866 LLVM10 anatofuz parents: diff changeset	969 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	970 ##rel9 ( ) [s0] -> (r0, r1) : 0 <= r0 <= 1023, 0 <= r1 <= s0 - 1
1d019706d866 LLVM10 anatofuz parents: diff changeset	971
1d019706d866 LLVM10 anatofuz parents: diff changeset	972 func @cblas_reduce_ffi(%M: memref<1024 x ? x f32, #layout_map0, /mem=/0>)
1d019706d866 LLVM10 anatofuz parents: diff changeset	973 -> f32 [
1d019706d866 LLVM10 anatofuz parents: diff changeset	974 reads: {%M, ##rel9() }
1d019706d866 LLVM10 anatofuz parents: diff changeset	975 writes: /* empty */
1d019706d866 LLVM10 anatofuz parents: diff changeset	976 may_reads: /* empty */
1d019706d866 LLVM10 anatofuz parents: diff changeset	977 may_writes: /* empty */
1d019706d866 LLVM10 anatofuz parents: diff changeset	978 ]
1d019706d866 LLVM10 anatofuz parents: diff changeset	979
1d019706d866 LLVM10 anatofuz parents: diff changeset	980 func @dma_mem_to_scratchpad(%a : memref<1024 x f32, #layout_map0, /mem=/0>,
1d019706d866 LLVM10 anatofuz parents: diff changeset	981 %b : memref<1024 x f32, #layout_map0, 1>, %c : memref<1024 x f32,
1d019706d866 LLVM10 anatofuz parents: diff changeset	982 #layout_map0>) [
1d019706d866 LLVM10 anatofuz parents: diff changeset	983 reads: {%M, ##rel9() }
1d019706d866 LLVM10 anatofuz parents: diff changeset	984 writes: /* empty */
1d019706d866 LLVM10 anatofuz parents: diff changeset	985 may_reads: /* empty */
1d019706d866 LLVM10 anatofuz parents: diff changeset	986 may_writes: /* empty */
1d019706d866 LLVM10 anatofuz parents: diff changeset	987 ]
1d019706d866 LLVM10 anatofuz parents: diff changeset	988
1d019706d866 LLVM10 anatofuz parents: diff changeset	989 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	990
1d019706d866 LLVM10 anatofuz parents: diff changeset	991 ### Memref Extensions
1d019706d866 LLVM10 anatofuz parents: diff changeset	992
1d019706d866 LLVM10 anatofuz parents: diff changeset	993 1. Arbitrary polyhedral shapes for tensors: e.g., triangular shapes in tensor
1d019706d866 LLVM10 anatofuz parents: diff changeset	994 dimensions where there is symmetry: use integer set (affine constraints) to
1d019706d866 LLVM10 anatofuz parents: diff changeset	995 model tensor data space (instead of just extents). Requires some changes to
1d019706d866 LLVM10 anatofuz parents: diff changeset	996 the IR and the in-memory form.
1d019706d866 LLVM10 anatofuz parents: diff changeset	997 1. Layout maps
1d019706d866 LLVM10 anatofuz parents: diff changeset	998
1d019706d866 LLVM10 anatofuz parents: diff changeset	999 1. Allow piece-wise affine maps for layouts: allows clean modeling of
1d019706d866 LLVM10 anatofuz parents: diff changeset	1000 boundary cases for images/tensors through padding, wrapping, mirroring,
1d019706d866 LLVM10 anatofuz parents: diff changeset	1001 padding where padded values are the results of computation as opposed to
1d019706d866 LLVM10 anatofuz parents: diff changeset	1002 data, padding in the interior as opposed to just boundaries.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1003 1. Allow many-to-one layout maps: Index and layout maps in the current
1d019706d866 LLVM10 anatofuz parents: diff changeset	1004 proposal are bijective. Extending them to many-to-one layout maps allows
1d019706d866 LLVM10 anatofuz parents: diff changeset	1005 cleaner(?) modeling of broadcast/reduce style computations while reusing
1d019706d866 LLVM10 anatofuz parents: diff changeset	1006 memory.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1007
1d019706d866 LLVM10 anatofuz parents: diff changeset	1008 Proposal 2(a) requires non-trivial changes to the IR and the in-memory
1d019706d866 LLVM10 anatofuz parents: diff changeset	1009 representation. 2(b) requires no change, but impacts how cost models look at
1d019706d866 LLVM10 anatofuz parents: diff changeset	1010 index and layout maps.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1011
1d019706d866 LLVM10 anatofuz parents: diff changeset	1012 ### `affine.if` and `affine.for` Extensions for "Escaping Scalars"
1d019706d866 LLVM10 anatofuz parents: diff changeset	1013
1d019706d866 LLVM10 anatofuz parents: diff changeset	1014 We considered providing a representation for SSA values that are live out of
1d019706d866 LLVM10 anatofuz parents: diff changeset	1015 `if/else` conditional bodies and loop carried in `affine.for` loops. We
1d019706d866 LLVM10 anatofuz parents: diff changeset	1016 ultimately abandoned this approach due to its complexity. In the current design
1d019706d866 LLVM10 anatofuz parents: diff changeset	1017 of MLIR, scalar variables cannot escape for loops or if instructions. In
1d019706d866 LLVM10 anatofuz parents: diff changeset	1018 situations, where escaping is necessary, we use zero-dimensional tensors and
1d019706d866 LLVM10 anatofuz parents: diff changeset	1019 memrefs instead of scalars.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1020
1d019706d866 LLVM10 anatofuz parents: diff changeset	1021 TODO: This whole section is obsolete and should be updated to use block
1d019706d866 LLVM10 anatofuz parents: diff changeset	1022 arguments and a yield like terminator in for/if instructions.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1023
1d019706d866 LLVM10 anatofuz parents: diff changeset	1024 The abandoned design of supporting escaping scalars is as follows:
1d019706d866 LLVM10 anatofuz parents: diff changeset	1025
1d019706d866 LLVM10 anatofuz parents: diff changeset	1026 #### affine.for Instruction
1d019706d866 LLVM10 anatofuz parents: diff changeset	1027
1d019706d866 LLVM10 anatofuz parents: diff changeset	1028 Syntax:
1d019706d866 LLVM10 anatofuz parents: diff changeset	1029
1d019706d866 LLVM10 anatofuz parents: diff changeset	1030 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	1031 [<out-var-list> =]
1d019706d866 LLVM10 anatofuz parents: diff changeset	1032 for %<index-variable-name> = <lower-bound> ... <upper-bound> step <step>
1d019706d866 LLVM10 anatofuz parents: diff changeset	1033 [with <in-var-list>] { <loop-instruction-list> }
1d019706d866 LLVM10 anatofuz parents: diff changeset	1034 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	1035
1d019706d866 LLVM10 anatofuz parents: diff changeset	1036 out-var-list is a comma separated list of SSA values defined in the loop body
1d019706d866 LLVM10 anatofuz parents: diff changeset	1037 and used outside the loop body. in-var-list is a comma separated list of SSA
1d019706d866 LLVM10 anatofuz parents: diff changeset	1038 values used inside the loop body and their initializers. loop-instruction-list
1d019706d866 LLVM10 anatofuz parents: diff changeset	1039 is a list of instructions that may also include a yield instruction.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1040
1d019706d866 LLVM10 anatofuz parents: diff changeset	1041 Example:
1d019706d866 LLVM10 anatofuz parents: diff changeset	1042
1d019706d866 LLVM10 anatofuz parents: diff changeset	1043 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	1044 // Return sum of elements in 1-dimensional mref A
1d019706d866 LLVM10 anatofuz parents: diff changeset	1045 func i32 @sum(%A : memref<?xi32>, %N : i32) -> (i32) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	1046 %init = 0
1d019706d866 LLVM10 anatofuz parents: diff changeset	1047 %result = affine.for %i = 0 to N with %tmp(%init) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	1048 %value = affine.load %A[%i]
1d019706d866 LLVM10 anatofuz parents: diff changeset	1049 %sum = %value + %tmp
1d019706d866 LLVM10 anatofuz parents: diff changeset	1050 yield %sum
1d019706d866 LLVM10 anatofuz parents: diff changeset	1051 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	1052 return %result : i32
1d019706d866 LLVM10 anatofuz parents: diff changeset	1053 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	1054 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	1055
1d019706d866 LLVM10 anatofuz parents: diff changeset	1056 #### affine.if/else Instruction
1d019706d866 LLVM10 anatofuz parents: diff changeset	1057
1d019706d866 LLVM10 anatofuz parents: diff changeset	1058 Syntax:
1d019706d866 LLVM10 anatofuz parents: diff changeset	1059
1d019706d866 LLVM10 anatofuz parents: diff changeset	1060 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	1061 <out-var-list> = affine.if (<cond-list>) {...} [else {...}]
1d019706d866 LLVM10 anatofuz parents: diff changeset	1062 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	1063
1d019706d866 LLVM10 anatofuz parents: diff changeset	1064 Out-var-list is a list of SSA values defined by the if-instruction. The values
1d019706d866 LLVM10 anatofuz parents: diff changeset	1065 are arguments to the yield-instruction that occurs in both then and else clauses
1d019706d866 LLVM10 anatofuz parents: diff changeset	1066 when else clause is present. When if instruction contains only if clause, the
1d019706d866 LLVM10 anatofuz parents: diff changeset	1067 escaping value defined in the then clause should be merged with the value the
1d019706d866 LLVM10 anatofuz parents: diff changeset	1068 variable had before the if instruction. The design captured here does not handle
1d019706d866 LLVM10 anatofuz parents: diff changeset	1069 this situation.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1070
1d019706d866 LLVM10 anatofuz parents: diff changeset	1071 Example:
1d019706d866 LLVM10 anatofuz parents: diff changeset	1072
1d019706d866 LLVM10 anatofuz parents: diff changeset	1073 ```mlir
1d019706d866 LLVM10 anatofuz parents: diff changeset	1074 // Compute sum of half of the array
1d019706d866 LLVM10 anatofuz parents: diff changeset	1075 func i32 @sum_half(%A : memref<?xi32>, %N : i32) -> (i32) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	1076 %s0 = 0
1d019706d866 LLVM10 anatofuz parents: diff changeset	1077 %s1 = affine.for %i = 1 ... N step 1 with %s2 (%s0) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	1078 %s3 = if (%i >= %N / 2) {
1d019706d866 LLVM10 anatofuz parents: diff changeset	1079 %v0 = affine.load %A[%i]
1d019706d866 LLVM10 anatofuz parents: diff changeset	1080 %s4 = %s2 + %v0
1d019706d866 LLVM10 anatofuz parents: diff changeset	1081 yield %s4
1d019706d866 LLVM10 anatofuz parents: diff changeset	1082 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	1083 yield %s3
1d019706d866 LLVM10 anatofuz parents: diff changeset	1084 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	1085 return %s1 : i32
1d019706d866 LLVM10 anatofuz parents: diff changeset	1086 }
1d019706d866 LLVM10 anatofuz parents: diff changeset	1087 ```
1d019706d866 LLVM10 anatofuz parents: diff changeset	1088
1d019706d866 LLVM10 anatofuz parents: diff changeset	1089 ### Multithreading the compiler
1d019706d866 LLVM10 anatofuz parents: diff changeset	1090
1d019706d866 LLVM10 anatofuz parents: diff changeset	1091 People want compilers to go fast, and one simple way to do that is to
1d019706d866 LLVM10 anatofuz parents: diff changeset	1092 multi-thread them. There are multiple strategies for this, but a simple one is
1d019706d866 LLVM10 anatofuz parents: diff changeset	1093 to optimize and compile separate functions in parallel. LLVM's original pass
1d019706d866 LLVM10 anatofuz parents: diff changeset	1094 manager anticipated this demand, and the CallGraphSCCPass manager is even
1d019706d866 LLVM10 anatofuz parents: diff changeset	1095 designed to support this as well, but unfortunately, a few early design
1d019706d866 LLVM10 anatofuz parents: diff changeset	1096 decisions in LLVM prevent this from ever happening. Instead, things like ThinLTO
1d019706d866 LLVM10 anatofuz parents: diff changeset	1097 are forced to split programs into separate LLVM modules/context and optimize
1d019706d866 LLVM10 anatofuz parents: diff changeset	1098 those chunks independently.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1099
1d019706d866 LLVM10 anatofuz parents: diff changeset	1100 The problem is that LLVM has several objects in its IR that are globally uniqued
1d019706d866 LLVM10 anatofuz parents: diff changeset	1101 and also mutable: notably constants like `i32 0`. In LLVM, these constants are
1d019706d866 LLVM10 anatofuz parents: diff changeset	1102 `Value`'s, which allow them to be used as operands to instructions, and that
1d019706d866 LLVM10 anatofuz parents: diff changeset	1103 they also have SSA use lists. Because these things are uniqued, every `i32 0` in
1d019706d866 LLVM10 anatofuz parents: diff changeset	1104 any function shares a use list. This means that optimizing multiple functions in
1d019706d866 LLVM10 anatofuz parents: diff changeset	1105 parallel won't work (at least without some sort of synchronization on the use
1d019706d866 LLVM10 anatofuz parents: diff changeset	1106 lists, which would be unbearably inefficient).
1d019706d866 LLVM10 anatofuz parents: diff changeset	1107
1d019706d866 LLVM10 anatofuz parents: diff changeset	1108 MLIR now supports a multithreaded pass manager. We do this through several
1d019706d866 LLVM10 anatofuz parents: diff changeset	1109 design choices:
1d019706d866 LLVM10 anatofuz parents: diff changeset	1110
1d019706d866 LLVM10 anatofuz parents: diff changeset	1111 1. MLIR makes use of extensive uniqued immutable data structures (affine
1d019706d866 LLVM10 anatofuz parents: diff changeset	1112 expressions, types, etc are all immutable, uniqued, and immortal).
1d019706d866 LLVM10 anatofuz parents: diff changeset	1113 2. Constants are defined in per-function pools, instead of being globally
1d019706d866 LLVM10 anatofuz parents: diff changeset	1114 uniqued.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1115 3. Functions themselves are not SSA values either, so they don't have the same
1d019706d866 LLVM10 anatofuz parents: diff changeset	1116 problem as constants.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1117 4. FunctionPasses are copied (through their copy ctor) into one instance per
1d019706d866 LLVM10 anatofuz parents: diff changeset	1118 thread, avoiding sharing of local state across threads.
1d019706d866 LLVM10 anatofuz parents: diff changeset	1119
1d019706d866 LLVM10 anatofuz parents: diff changeset	1120 This allows MLIR function passes to support efficient multithreaded compilation
1d019706d866 LLVM10 anatofuz parents: diff changeset	1121 and code generation.

Mercurial > hg > CbC > CbC_llvm

annotate mlir/docs/Rationale.md @ 164:fdfabb438fbf