annotate mlir/docs/Rationale.md @ 164:fdfabb438fbf

...
author anatofuz
date Thu, 19 Mar 2020 17:02:53 +0900
parents 1d019706d866
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
150
anatofuz
parents:
diff changeset
1 # MLIR Rationale
anatofuz
parents:
diff changeset
2
anatofuz
parents:
diff changeset
3 This document is intended to capture some of the alternatives considered and
anatofuz
parents:
diff changeset
4 open debates in the design of MLIR, along with the rationale for certain
anatofuz
parents:
diff changeset
5 decisions we made. This is not intended to be a "finely groomed" document - we
anatofuz
parents:
diff changeset
6 prefer the ability to dump in interesting tidbits without worrying too much
anatofuz
parents:
diff changeset
7 about their consistency or readability.
anatofuz
parents:
diff changeset
8
anatofuz
parents:
diff changeset
9 [TOC]
anatofuz
parents:
diff changeset
10
anatofuz
parents:
diff changeset
11 ## Abstract
anatofuz
parents:
diff changeset
12
anatofuz
parents:
diff changeset
13 MLIR is a compiler intermediate representation with similarities to traditional
anatofuz
parents:
diff changeset
14 three-address SSA representations (like
anatofuz
parents:
diff changeset
15 [LLVM IR](http://llvm.org/docs/LangRef.html) or
anatofuz
parents:
diff changeset
16 [SIL](https://github.com/apple/swift/blob/master/docs/SIL.rst)), but which
anatofuz
parents:
diff changeset
17 introduces notions from the polyhedral loop optimization works as first class
anatofuz
parents:
diff changeset
18 concepts. This hybrid design is optimized to represent, analyze, and transform
anatofuz
parents:
diff changeset
19 high level dataflow graphs as well as target-specific code generated for high
anatofuz
parents:
diff changeset
20 performance data parallel systems. Beyond its representational capabilities, its
anatofuz
parents:
diff changeset
21 single continuous design provides a framework to lower from dataflow graphs to
anatofuz
parents:
diff changeset
22 high performance target specific code.
anatofuz
parents:
diff changeset
23
anatofuz
parents:
diff changeset
24 MLIR stands for one of "Multi-Level IR" or "Multi-dimensional Loop IR" or
anatofuz
parents:
diff changeset
25 "Machine Learning IR" or "Mid Level IR", we prefer the first. This document only
anatofuz
parents:
diff changeset
26 provides the rationale behind MLIR -- its actual
anatofuz
parents:
diff changeset
27 [specification document](LangRef.md) and other content is hosted elsewhere.
anatofuz
parents:
diff changeset
28
anatofuz
parents:
diff changeset
29 ## Introduction and Motivation
anatofuz
parents:
diff changeset
30
anatofuz
parents:
diff changeset
31 The Multi-Level Intermediate Representation (MLIR) is intended for easy
anatofuz
parents:
diff changeset
32 expression and optimization of computations involving deep loop nests and dense
anatofuz
parents:
diff changeset
33 matrices of high dimensionality. It is thus well-suited to deep learning
anatofuz
parents:
diff changeset
34 computations in particular. Yet it is general enough to also represent arbitrary
anatofuz
parents:
diff changeset
35 sequential computation. The representation allows high-level optimization and
anatofuz
parents:
diff changeset
36 parallelization for a wide range of parallel architectures including those with
anatofuz
parents:
diff changeset
37 deep memory hierarchies --- general-purpose multicores, GPUs, and specialized
anatofuz
parents:
diff changeset
38 neural network accelerators.
anatofuz
parents:
diff changeset
39
anatofuz
parents:
diff changeset
40 MLIR uses ideas drawn from IRs of LLVM and Swift for lower level constructs
anatofuz
parents:
diff changeset
41 while combining them with ideas from the polyhedral abstraction to represent
anatofuz
parents:
diff changeset
42 loop nests, multidimensional data (tensors), and transformations on these
anatofuz
parents:
diff changeset
43 entities as first class concepts in the IR.
anatofuz
parents:
diff changeset
44
anatofuz
parents:
diff changeset
45 MLIR is a multi-level IR, i.e., it represents code at a domain-specific
anatofuz
parents:
diff changeset
46 representation such as HLO or TensorFlow graphs, all the way down to the machine
anatofuz
parents:
diff changeset
47 level. MLIR is able to represent arbitrary control flow and arbitrary data
anatofuz
parents:
diff changeset
48 accesses, and is general enough to represent nearly all sequential computation.
anatofuz
parents:
diff changeset
49 This is a key distinction from existing polyhedral representation
anatofuz
parents:
diff changeset
50 implementations (such as LLVM [Polly](https://polly.llvm.org/)) that are able to
anatofuz
parents:
diff changeset
51 use the polyhedral abstraction in a way isolated from the LLVM IR and only for
anatofuz
parents:
diff changeset
52 affine loop nests, i.e., portions of the code where array accesses, loop bounds,
anatofuz
parents:
diff changeset
53 and conditionals are regular (involve linear functions of loop iterators and
anatofuz
parents:
diff changeset
54 constant symbols). The presence of statically unpredictable data accesses or
anatofuz
parents:
diff changeset
55 control flow does not preclude representation in MLIR, but only limits to a
anatofuz
parents:
diff changeset
56 certain extent the ability to reason about and apply transformations using the
anatofuz
parents:
diff changeset
57 polyhedral abstraction.
anatofuz
parents:
diff changeset
58
anatofuz
parents:
diff changeset
59 Maps, sets, and relations with affine constraints are the core structures
anatofuz
parents:
diff changeset
60 underlying a polyhedral representation of high-dimensional loop nests and
anatofuz
parents:
diff changeset
61 multidimensional arrays. These structures are represented as textual
anatofuz
parents:
diff changeset
62 expressions in a form close to their mathematical form. These structures are
anatofuz
parents:
diff changeset
63 used to capture loop nests, tensor data structures, and how they are reordered
anatofuz
parents:
diff changeset
64 and mapped for a target architecture. All structured or "conforming" loops are
anatofuz
parents:
diff changeset
65 captured as part of the polyhedral information, and so are tensor variables,
anatofuz
parents:
diff changeset
66 their layouts, and subscripted accesses to these tensors in memory.
anatofuz
parents:
diff changeset
67
anatofuz
parents:
diff changeset
68 The information captured in the IR allows a compact expression of all loop
anatofuz
parents:
diff changeset
69 transformations, data remappings, explicit copying necessary for explicitly
anatofuz
parents:
diff changeset
70 addressed memory in accelerators, mapping to pre-tuned expert written
anatofuz
parents:
diff changeset
71 primitives, and mapping to specialized vector instructions. Loop transformations
anatofuz
parents:
diff changeset
72 that can be easily implemented include the body of affine transformations: these
anatofuz
parents:
diff changeset
73 subsume all traditional loop transformations (unimodular and non-unimodular)
anatofuz
parents:
diff changeset
74 such as loop tiling, interchange, permutation, skewing, scaling, relative
anatofuz
parents:
diff changeset
75 shifting, reversal, fusion, and distribution/fission. Transformations on data
anatofuz
parents:
diff changeset
76 layout such as padding and transforming to blocked layouts are also represented
anatofuz
parents:
diff changeset
77 well via affine layout maps.
anatofuz
parents:
diff changeset
78
anatofuz
parents:
diff changeset
79 MLIR's design allows a progressive lowering to target-specific forms. Besides
anatofuz
parents:
diff changeset
80 high-level transformations for loop nests and data layouts that a typical
anatofuz
parents:
diff changeset
81 mid-level optimizer is expected to deal with, MLIR is also designed to perform
anatofuz
parents:
diff changeset
82 certain low-level scheduling and mapping decisions that a typical backend IR is
anatofuz
parents:
diff changeset
83 entrusted with: these include mapping to specialized vector instructions,
anatofuz
parents:
diff changeset
84 auto-vectorization, and software pipelining. The need to support these
anatofuz
parents:
diff changeset
85 transformations stems from the fact that neural network accelerators have
anatofuz
parents:
diff changeset
86 specialized units that deal with large chunks of data whose computation maps
anatofuz
parents:
diff changeset
87 back to chunks of more than one loop of the loop nests as viewed by a program at
anatofuz
parents:
diff changeset
88 a level closer to the original specification. Such specialized units or
anatofuz
parents:
diff changeset
89 instructions operate on multidimensional data chunks from a programmer's
anatofuz
parents:
diff changeset
90 viewpoint. It thus makes it hard or infeasible for a backend operating on a very
anatofuz
parents:
diff changeset
91 low-level IR close to assembly to lift and reconstruct loops and perform such a
anatofuz
parents:
diff changeset
92 mapping. This is in contrast to classic instruction selection and scheduling in
anatofuz
parents:
diff changeset
93 today's compilers that primarily only deals with the body of the innermost loop.
anatofuz
parents:
diff changeset
94 MLIR also facilitates automatic mapping to expert pre-tuned primitives or vendor
anatofuz
parents:
diff changeset
95 libraries operating on data at higher levels (or at the highest level) of the
anatofuz
parents:
diff changeset
96 memory hierarchy.
anatofuz
parents:
diff changeset
97
anatofuz
parents:
diff changeset
98 In summary, MLIR is convenient for and closed under the kind of transformations
anatofuz
parents:
diff changeset
99 needed to lower to general-purpose as well as specialized accelerators. It also
anatofuz
parents:
diff changeset
100 allows one to build modular and reusable target independent and target dependent
anatofuz
parents:
diff changeset
101 passes.
anatofuz
parents:
diff changeset
102
anatofuz
parents:
diff changeset
103 ## Design Decisions
anatofuz
parents:
diff changeset
104
anatofuz
parents:
diff changeset
105 This section sheds light on some of the design decisions -- some of these are
anatofuz
parents:
diff changeset
106 indirectly implied by the specification document.
anatofuz
parents:
diff changeset
107
anatofuz
parents:
diff changeset
108 ### Loads and stores
anatofuz
parents:
diff changeset
109
anatofuz
parents:
diff changeset
110 The 'load' and 'store' instructions are specifically crafted to fully resolve to
anatofuz
parents:
diff changeset
111 an element of a memref. These instructions take as arguments n+1 indices for an
anatofuz
parents:
diff changeset
112 n-ranked tensor. This disallows the equivalent of pointer arithmetic or the
anatofuz
parents:
diff changeset
113 ability to index into the same memref in other ways (something which C arrays
anatofuz
parents:
diff changeset
114 allow for example). Furthermore, for the affine constructs, the compiler can
anatofuz
parents:
diff changeset
115 follow use-def chains (e.g. through
anatofuz
parents:
diff changeset
116 [affine.apply operations](Dialects/Affine.md#affineapply-operation)) or through
anatofuz
parents:
diff changeset
117 the map attributes of [affine operations](Dialects/Affine.md#Operations)) to
anatofuz
parents:
diff changeset
118 precisely analyze references at compile-time using polyhedral techniques. This
anatofuz
parents:
diff changeset
119 is possible because of the [restrictions on dimensions and symbols](Dialects/Affine.md#restrictions-on-dimensions-and-symbols).
anatofuz
parents:
diff changeset
120
anatofuz
parents:
diff changeset
121 A scalar of element-type (a primitive type or a vector type) that is stored in
anatofuz
parents:
diff changeset
122 memory is modeled as a 0-d memref. This is also necessary for scalars that are
anatofuz
parents:
diff changeset
123 live out of for loops and if conditionals in a function, for which we don't yet
anatofuz
parents:
diff changeset
124 have an SSA representation --
anatofuz
parents:
diff changeset
125 [an extension](#mlfunction-extensions-for-"escaping-scalars") to allow that is
anatofuz
parents:
diff changeset
126 described later in this doc.
anatofuz
parents:
diff changeset
127
anatofuz
parents:
diff changeset
128 ### Symbols and types
anatofuz
parents:
diff changeset
129
anatofuz
parents:
diff changeset
130 The current MLIR disallows use of symbols in types. For example, when a tensor
anatofuz
parents:
diff changeset
131 or memref dimension is statically unknown, it is denoted in the type as '?'. An
anatofuz
parents:
diff changeset
132 SSA symbol is then bound to it when a memref is created. The actual value of the
anatofuz
parents:
diff changeset
133 unknown dimension can be queried using the "dim" builtin as shown below.
anatofuz
parents:
diff changeset
134
anatofuz
parents:
diff changeset
135 Example:
anatofuz
parents:
diff changeset
136
anatofuz
parents:
diff changeset
137 ```mlir
anatofuz
parents:
diff changeset
138 func foo(...) {
anatofuz
parents:
diff changeset
139 %A = alloc <8x?xf32, #lmap> (%N)
anatofuz
parents:
diff changeset
140 ...
anatofuz
parents:
diff changeset
141 call bar(%A) : (memref<8x?xf32, #lmap>)
anatofuz
parents:
diff changeset
142 }
anatofuz
parents:
diff changeset
143
anatofuz
parents:
diff changeset
144 func bar(%A : memref<8x?xf32, #lmap>) {
anatofuz
parents:
diff changeset
145 // Type of %A indicates that %A has dynamic shape with 8 rows
anatofuz
parents:
diff changeset
146 // and unknown number of columns. The number of columns is queried
anatofuz
parents:
diff changeset
147 // dynamically using dim instruction.
anatofuz
parents:
diff changeset
148 %N = dim %A, 1 : memref<8x?xf32, #lmap>
anatofuz
parents:
diff changeset
149
anatofuz
parents:
diff changeset
150 affine.for %i = 0 to 8 {
anatofuz
parents:
diff changeset
151 affine.for %j = 0 to %N {
anatofuz
parents:
diff changeset
152 // A[i,j] += 1
anatofuz
parents:
diff changeset
153 %s1 = affine.load %A[%i, %j] : memref<8x?xf32, #lmap>
anatofuz
parents:
diff changeset
154 %s2 = add %s1, 1
anatofuz
parents:
diff changeset
155 affine.store %s2, %A[%i, %j] : memref<8x?xf32, #lmap>
anatofuz
parents:
diff changeset
156 }
anatofuz
parents:
diff changeset
157 }
anatofuz
parents:
diff changeset
158 return
anatofuz
parents:
diff changeset
159 }
anatofuz
parents:
diff changeset
160
anatofuz
parents:
diff changeset
161 ```
anatofuz
parents:
diff changeset
162
anatofuz
parents:
diff changeset
163 An alternative design is to embed the reference to symbols directly in the
anatofuz
parents:
diff changeset
164 type - memref<8x%Nxf32>. We went for the current approach in MLIR because it
anatofuz
parents:
diff changeset
165 simplifies the design --- types remain immutable when the values of symbols
anatofuz
parents:
diff changeset
166 change.
anatofuz
parents:
diff changeset
167
anatofuz
parents:
diff changeset
168 ### Block Arguments vs PHI nodes
anatofuz
parents:
diff changeset
169
anatofuz
parents:
diff changeset
170 MLIR Regions represent SSA using "[block arguments](LangRef.md#blocks)" rather
anatofuz
parents:
diff changeset
171 than [PHI instructions](http://llvm.org/docs/LangRef.html#i-phi) used in LLVM.
anatofuz
parents:
diff changeset
172 This choice is representationally identical (the same constructs can be
anatofuz
parents:
diff changeset
173 represented in either form) but block arguments have several advantages:
anatofuz
parents:
diff changeset
174
anatofuz
parents:
diff changeset
175 1. LLVM PHI nodes always have to be kept at the top of a block, and
anatofuz
parents:
diff changeset
176 transformations frequently have to manually skip over them. This is defined
anatofuz
parents:
diff changeset
177 away with BB arguments.
anatofuz
parents:
diff changeset
178 1. LLVM has a separate function Argument node. This is defined away with BB
anatofuz
parents:
diff changeset
179 arguments, because the arguments to the entry block serve this purpose.
anatofuz
parents:
diff changeset
180 1. Blocks of PHI nodes in LLVM execute atomically, which is surprising and
anatofuz
parents:
diff changeset
181 super confusing to compiler engineers and it is easy to introduce bugs with
anatofuz
parents:
diff changeset
182 this (very related to the
anatofuz
parents:
diff changeset
183 "[lost copy](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.524.5461&rep=rep1&type=pdf)"
anatofuz
parents:
diff changeset
184 problem in SSA lowering literature.) With the BB argument representation,
anatofuz
parents:
diff changeset
185 this confusion is defined away.
anatofuz
parents:
diff changeset
186 1. The entry list of PHI nodes in LLVM are unordered, and some blocks have
anatofuz
parents:
diff changeset
187 thousands of predecessors (e.g. unwind blocks). This can cause long compile
anatofuz
parents:
diff changeset
188 time problems because transformations have to linearly scan this list. This
anatofuz
parents:
diff changeset
189 is defined away with BB argument representation.
anatofuz
parents:
diff changeset
190 1. LLVM has no way to represent values that are available only in one successor
anatofuz
parents:
diff changeset
191 but not the other, e.g. its invoke instruction cannot produce the exception
anatofuz
parents:
diff changeset
192 value JUST on the exception edge. Instead, the
anatofuz
parents:
diff changeset
193 [landingpad instruction](http://llvm.org/docs/LangRef.html#landingpad-instruction)
anatofuz
parents:
diff changeset
194 is a hack used to represent this. MLIR doesn't make use of this capability,
anatofuz
parents:
diff changeset
195 but SIL uses it extensively, e.g. in the
anatofuz
parents:
diff changeset
196 [switch_enum instruction](https://github.com/apple/swift/blob/master/docs/SIL.rst#switch-enum).
anatofuz
parents:
diff changeset
197
anatofuz
parents:
diff changeset
198 For more context, block arguments were previously used in the Swift
anatofuz
parents:
diff changeset
199 [SIL Intermediate Representation](https://github.com/apple/swift/blob/master/docs/SIL.rst),
anatofuz
parents:
diff changeset
200 and described in
anatofuz
parents:
diff changeset
201 [a talk on YouTube](https://www.youtube.com/watch?v=Ntj8ab-5cvE). The section of
anatofuz
parents:
diff changeset
202 interest
anatofuz
parents:
diff changeset
203 [starts here](https://www.google.com/url?q=https://youtu.be/Ntj8ab-5cvE?t%3D596&sa=D&ust=1529450150971000&usg=AFQjCNFQHEWL7m8q3eO-1DiKw9zqC2v24Q).
anatofuz
parents:
diff changeset
204
anatofuz
parents:
diff changeset
205 ### Index type disallowed in vector/tensor/memref types
anatofuz
parents:
diff changeset
206
anatofuz
parents:
diff changeset
207 Index types are not allowed as elements of `vector`, `tensor` or `memref` type.
anatofuz
parents:
diff changeset
208 Index types are intended to be used for platform-specific "size" values and may
anatofuz
parents:
diff changeset
209 appear in subscripts, sizes of aggregate types and affine expressions. They are
anatofuz
parents:
diff changeset
210 also tightly coupled with `affine.apply` and affine.load/store operations;
anatofuz
parents:
diff changeset
211 having `index` type is a necessary precondition of a value to be acceptable by
anatofuz
parents:
diff changeset
212 these operations. While it may be useful to have `memref<?xindex>` to express
anatofuz
parents:
diff changeset
213 indirect accesses, e.g. sparse matrix manipulations or lookup tables, it creates
anatofuz
parents:
diff changeset
214 problems MLIR is not ready to address yet. MLIR needs to internally store
anatofuz
parents:
diff changeset
215 constants of aggregate types and emit code operating on values of those types,
anatofuz
parents:
diff changeset
216 which are subject to target-specific size and alignment constraints. Since MLIR
anatofuz
parents:
diff changeset
217 does not have a target description mechanism at the moment, it cannot reliably
anatofuz
parents:
diff changeset
218 emit such code. Moreover, some platforms may not support vectors of type
anatofuz
parents:
diff changeset
219 equivalent to `index`.
anatofuz
parents:
diff changeset
220
anatofuz
parents:
diff changeset
221 Indirect access use cases can be alternatively supported by providing and
anatofuz
parents:
diff changeset
222 `index_cast` instruction that allows for conversion between `index` and
anatofuz
parents:
diff changeset
223 fixed-width integer types, at the SSA value level. It has an additional benefit
anatofuz
parents:
diff changeset
224 of supporting smaller integer types, e.g. `i8` or `i16`, for small indices
anatofuz
parents:
diff changeset
225 instead of (presumably larger) `index` type.
anatofuz
parents:
diff changeset
226
anatofuz
parents:
diff changeset
227 ### Bit width of a non-primitive types and `index` is undefined
anatofuz
parents:
diff changeset
228
anatofuz
parents:
diff changeset
229 The bit width of a compound type is not defined by MLIR, it may be defined by a
anatofuz
parents:
diff changeset
230 specific lowering pass. In MLIR, bit width is a property of certain primitive
anatofuz
parents:
diff changeset
231 _type_, in particular integers and floats. It is equal to the number that
anatofuz
parents:
diff changeset
232 appears in the type definition, e.g. the bit width of `i32` is `32`, so is the
anatofuz
parents:
diff changeset
233 bit width of `f32`. The bit width is not _necessarily_ related to the amount of
anatofuz
parents:
diff changeset
234 memory (in bytes) or the size of register (in bits) that is necessary to store
anatofuz
parents:
diff changeset
235 the value of the given type. These quantities are target and ABI-specific and
anatofuz
parents:
diff changeset
236 should be defined during the lowering process rather than imposed from above.
anatofuz
parents:
diff changeset
237 For example, `vector<3xi57>` is likely to be lowered to a vector of four 64-bit
anatofuz
parents:
diff changeset
238 integers, so that its storage requirement is `4 x 64 / 8 = 32` bytes, rather
anatofuz
parents:
diff changeset
239 than `(3 x 57) ceildiv 8 = 22` bytes as can be naively computed from the
anatofuz
parents:
diff changeset
240 bitwidth. Individual components of MLIR that allocate space for storing values
anatofuz
parents:
diff changeset
241 may use the bit size as the baseline and query the target description when it is
anatofuz
parents:
diff changeset
242 introduced.
anatofuz
parents:
diff changeset
243
anatofuz
parents:
diff changeset
244 The bit width is not defined for dialect-specific types at MLIR level. Dialects
anatofuz
parents:
diff changeset
245 are free to define their own quantities for type sizes.
anatofuz
parents:
diff changeset
246
anatofuz
parents:
diff changeset
247 ### Signless types
anatofuz
parents:
diff changeset
248
anatofuz
parents:
diff changeset
249 Integers in the builtin MLIR type system have a bitwidth (note that the `index`
anatofuz
parents:
diff changeset
250 type has a symbolic width equal to the machine word size), but they do not have
anatofuz
parents:
diff changeset
251 an intrinsic sign. This means that the "standard ops" operation set has things
anatofuz
parents:
diff changeset
252 like `addi` and `muli` which do two's complement arithmetic, but some other
anatofuz
parents:
diff changeset
253 operations get a sign, e.g. `divis` vs `diviu`.
anatofuz
parents:
diff changeset
254
anatofuz
parents:
diff changeset
255 LLVM uses the [same design](http://llvm.org/docs/LangRef.html#integer-type),
anatofuz
parents:
diff changeset
256 which was introduced in a revamp rolled out
anatofuz
parents:
diff changeset
257 [in the LLVM 2.0 integer type](http://releases.llvm.org/2.0/docs/LangRef.html#t_derived).
anatofuz
parents:
diff changeset
258 Prior to that, from
anatofuz
parents:
diff changeset
259 [LLVM 1.0](http://releases.llvm.org/1.0/docs/LangRef.html#t_classifications) to
anatofuz
parents:
diff changeset
260 [1.9](http://releases.llvm.org/1.9/docs/LangRef.html#t_classifications), LLVM
anatofuz
parents:
diff changeset
261 uses signed types like "sbyte" and "ubyte". This shift was important and has
anatofuz
parents:
diff changeset
262 served LLVM well over the years. The reason this is important is that it is a
anatofuz
parents:
diff changeset
263 good thing for an intermediate representation to represent the same computation
anatofuz
parents:
diff changeset
264 with the same instruction. Signed types got in the way, because (e.g.) an "add
anatofuz
parents:
diff changeset
265 of an sbyte" does the same computation as an "add of a ubyte", but the type
anatofuz
parents:
diff changeset
266 system made them look artificially different. This split also required casts
anatofuz
parents:
diff changeset
267 like "cast from sbyte to ubyte" which do nothing at the machine level. Removing
anatofuz
parents:
diff changeset
268 signs from the type system eliminated these problems, making the compiler
anatofuz
parents:
diff changeset
269 simpler.
anatofuz
parents:
diff changeset
270
anatofuz
parents:
diff changeset
271 More information about this split is available in an old
anatofuz
parents:
diff changeset
272 [talk on youtube](https://www.youtube.com/watch?v=VeRaLPupGks) talking about
anatofuz
parents:
diff changeset
273 LLVM 2.0.
anatofuz
parents:
diff changeset
274
anatofuz
parents:
diff changeset
275 Note that this rationale only applies to the "standard ops" dialect in which we
anatofuz
parents:
diff changeset
276 can express an opinion about its design. Other dialects generally try to model
anatofuz
parents:
diff changeset
277 an external system, and should aim to reflect its design as closely as possible.
anatofuz
parents:
diff changeset
278
anatofuz
parents:
diff changeset
279 ### Splitting floating point vs integer operations
anatofuz
parents:
diff changeset
280
anatofuz
parents:
diff changeset
281 The MLIR "standard" operation set splits many integer and floating point
anatofuz
parents:
diff changeset
282 operations into different categories, for example `addf` vs `addi` and `cmpf` vs
anatofuz
parents:
diff changeset
283 `cmpi`
anatofuz
parents:
diff changeset
284 ([following the design of LLVM](http://llvm.org/docs/LangRef.html#binary-operations)).
anatofuz
parents:
diff changeset
285 These instructions _are_ polymorphic on the number of elements in the type
anatofuz
parents:
diff changeset
286 though, for example `addf` is used with scalar floats, vectors of floats, and
anatofuz
parents:
diff changeset
287 tensors of floats (LLVM does the same thing with its scalar/vector types).
anatofuz
parents:
diff changeset
288
anatofuz
parents:
diff changeset
289 This split is important because floating point and integer operations are quite
anatofuz
parents:
diff changeset
290 different in practice: for example, floating point values include NaN's, so
anatofuz
parents:
diff changeset
291 [integer comparisons](http://llvm.org/docs/LangRef.html#icmp-instruction) and
anatofuz
parents:
diff changeset
292 [floating point comparisons](http://llvm.org/docs/LangRef.html#fcmp-instruction)
anatofuz
parents:
diff changeset
293 should use different comparison opcodes. On the arithmetic side of things,
anatofuz
parents:
diff changeset
294 floating point operations support rounding modes, floating point contractions,
anatofuz
parents:
diff changeset
295 ["fast math"](http://llvm.org/docs/LangRef.html#fadd-instruction), and integers
anatofuz
parents:
diff changeset
296 may want to have two's complement overflow behavior or be undefined on
anatofuz
parents:
diff changeset
297 [various forms of wrapping](http://llvm.org/docs/LangRef.html#add-instruction)
anatofuz
parents:
diff changeset
298 for performance.
anatofuz
parents:
diff changeset
299
anatofuz
parents:
diff changeset
300 We are a long way from this sort of thing being a priority to care about in
anatofuz
parents:
diff changeset
301 MLIR, but since we have experience and know the right way to do this, we'd
anatofuz
parents:
diff changeset
302 rather design it in from the beginning.
anatofuz
parents:
diff changeset
303
anatofuz
parents:
diff changeset
304 Note that this rationale only applies to the "standard ops" dialect in which we
anatofuz
parents:
diff changeset
305 can express an opinion about its design. Other dialects generally try to model
anatofuz
parents:
diff changeset
306 an external system, and should aim to reflect its design as closely as possible.
anatofuz
parents:
diff changeset
307
anatofuz
parents:
diff changeset
308 ### Specifying sign in integer comparison operations
anatofuz
parents:
diff changeset
309
anatofuz
parents:
diff changeset
310 Since integers are [signless](#signless-types), it is necessary to define the
anatofuz
parents:
diff changeset
311 sign for integer comparison operations. This sign indicates how to treat the
anatofuz
parents:
diff changeset
312 foremost bit of the integer: as sign bit or as most significant bit. For
anatofuz
parents:
diff changeset
313 example, comparing two `i4` values `0b1000` and `0b0010` yields different
anatofuz
parents:
diff changeset
314 results for unsigned (`8 > 3`) and signed (`-8 < 3`) interpretations. This
anatofuz
parents:
diff changeset
315 difference is only significant for _order_ comparisons, but not for _equality_
anatofuz
parents:
diff changeset
316 comparisons. Indeed, for the latter all bits must have the same value
anatofuz
parents:
diff changeset
317 independently of the sign. Since both arguments have exactly the same bit width
anatofuz
parents:
diff changeset
318 and cannot be padded by this operation, it is impossible to compare two values
anatofuz
parents:
diff changeset
319 whose bit representations would differ while the values are interpreted as
anatofuz
parents:
diff changeset
320 equal.
anatofuz
parents:
diff changeset
321
anatofuz
parents:
diff changeset
322 ### Specifying comparison kind as attribute
anatofuz
parents:
diff changeset
323
anatofuz
parents:
diff changeset
324 Unlike arithmetic, comparison operators share several common properties, e.g.
anatofuz
parents:
diff changeset
325 they cannot be considered associative. In practice, comparisons are sometimes
anatofuz
parents:
diff changeset
326 implemented by the same instruction or its variants so it makes sense to group
anatofuz
parents:
diff changeset
327 them together at the IR level.
anatofuz
parents:
diff changeset
328
anatofuz
parents:
diff changeset
329 An alternative would be introducing ten distinct operators for all currently
anatofuz
parents:
diff changeset
330 supported kinds of integer comparisons. These operators would have increased the
anatofuz
parents:
diff changeset
331 number of "reserved" names used by standard operations as well as the size of
anatofuz
parents:
diff changeset
332 the C++ API while their implementations would have been mostly identical.
anatofuz
parents:
diff changeset
333
anatofuz
parents:
diff changeset
334 The comparison kind is internally an integer attribute. However, for the sake of
anatofuz
parents:
diff changeset
335 readability by humans, custom assembly form accepts string literals that are
anatofuz
parents:
diff changeset
336 mapped to the underlying integer values: `cmpi "eq", %lhs, %rhs` better implies
anatofuz
parents:
diff changeset
337 integer equality comparison than `cmpi 0, %lhs, %rhs` where it is unclear what
anatofuz
parents:
diff changeset
338 gets compared to what else. This syntactic sugar is possible thanks to parser
anatofuz
parents:
diff changeset
339 logic redefinitions for custom assembly form of non-builtin operations.
anatofuz
parents:
diff changeset
340 Supporting it in the full notation would have required changing how the main
anatofuz
parents:
diff changeset
341 parsing algorithm works and may have unexpected repercussions. While it had been
anatofuz
parents:
diff changeset
342 possible to store the predicate as string attribute, it would have rendered
anatofuz
parents:
diff changeset
343 impossible to implement switching logic based on the comparison kind and made
anatofuz
parents:
diff changeset
344 attribute validity checks (one out of ten possible kinds) more complex.
anatofuz
parents:
diff changeset
345
anatofuz
parents:
diff changeset
346 ### 'select' operation to implement min/max
anatofuz
parents:
diff changeset
347
anatofuz
parents:
diff changeset
348 Although `min` and `max` operations are likely to occur as a result of
anatofuz
parents:
diff changeset
349 transforming affine loops in ML functions, we did not make them first-class
anatofuz
parents:
diff changeset
350 operations. Instead, we provide the `select` operation that can be combined with
anatofuz
parents:
diff changeset
351 `cmpi` to implement the minimum and maximum computation. Although they now
anatofuz
parents:
diff changeset
352 require two operations, they are likely to be emitted automatically during the
anatofuz
parents:
diff changeset
353 transformation inside MLIR. On the other hand, there are multiple benefits of
anatofuz
parents:
diff changeset
354 introducing `select`: standalone min/max would concern themselves with the
anatofuz
parents:
diff changeset
355 signedness of the comparison, already taken into account by `cmpi`; `select` can
anatofuz
parents:
diff changeset
356 support floats transparently if used after a float-comparison operation; the
anatofuz
parents:
diff changeset
357 lower-level targets provide `select`-like instructions making the translation
anatofuz
parents:
diff changeset
358 trivial.
anatofuz
parents:
diff changeset
359
anatofuz
parents:
diff changeset
360 This operation could have been implemented with additional control flow: `%r =
anatofuz
parents:
diff changeset
361 select %cond, %t, %f` is equivalent to
anatofuz
parents:
diff changeset
362
anatofuz
parents:
diff changeset
363 ```mlir
anatofuz
parents:
diff changeset
364 ^bb0:
anatofuz
parents:
diff changeset
365 cond_br %cond, ^bb1(%t), ^bb1(%f)
anatofuz
parents:
diff changeset
366 ^bb1(%r):
anatofuz
parents:
diff changeset
367 ```
anatofuz
parents:
diff changeset
368
anatofuz
parents:
diff changeset
369 However, this control flow granularity is not available in the ML functions
anatofuz
parents:
diff changeset
370 where min/max, and thus `select`, are likely to appear. In addition, simpler
anatofuz
parents:
diff changeset
371 control flow may be beneficial for optimization in general.
anatofuz
parents:
diff changeset
372
anatofuz
parents:
diff changeset
373 ### Regions
anatofuz
parents:
diff changeset
374
anatofuz
parents:
diff changeset
375 #### Attributes of type 'Block'
anatofuz
parents:
diff changeset
376
anatofuz
parents:
diff changeset
377 We considered representing regions through `ArrayAttr`s containing a list of a
anatofuz
parents:
diff changeset
378 special type `IRBlockAttr`, which in turn would contain a list of operations.
anatofuz
parents:
diff changeset
379 All attributes in MLIR are unique’d within the context, which would make the IR
anatofuz
parents:
diff changeset
380 inside the regions immortal for no good reason.
anatofuz
parents:
diff changeset
381
anatofuz
parents:
diff changeset
382 #### Use "inlined" functions as regions
anatofuz
parents:
diff changeset
383
anatofuz
parents:
diff changeset
384 We considered attaching a "force-inline" attribute on a function and/or a
anatofuz
parents:
diff changeset
385 function `call` operation. Even the minimal region support (use cases in
anatofuz
parents:
diff changeset
386 affine.for and affine.if existing before the regions) requires access to the
anatofuz
parents:
diff changeset
387 values defined in the dominating block, which is not supported by functions.
anatofuz
parents:
diff changeset
388 Conceptually, function bodies are instances of regions rather than the inverse;
anatofuz
parents:
diff changeset
389 regions can also be device kernels, alternative sections, etc.
anatofuz
parents:
diff changeset
390
anatofuz
parents:
diff changeset
391 #### Dedicated `region` operation
anatofuz
parents:
diff changeset
392
anatofuz
parents:
diff changeset
393 This would mean we have a special kind of operation that is allowed to have
anatofuz
parents:
diff changeset
394 regions while other operations are not. Such distinction is similar to the
anatofuz
parents:
diff changeset
395 Stmt/Op difference we have had and chose to remove to make the IR simpler and
anatofuz
parents:
diff changeset
396 more flexible. It would also require analyses and passes to consider the
anatofuz
parents:
diff changeset
397 interplay between operations (e.g., an `affine.for` operation must be followed
anatofuz
parents:
diff changeset
398 by a region operation). Finally, a region operation can be introduced using the
anatofuz
parents:
diff changeset
399 current implementation, among other operations and without being special in any
anatofuz
parents:
diff changeset
400 sense.
anatofuz
parents:
diff changeset
401
anatofuz
parents:
diff changeset
402 #### Explicit capture of the values used in a region
anatofuz
parents:
diff changeset
403
anatofuz
parents:
diff changeset
404 Being able to use values defined outside the region implies that use-def chains
anatofuz
parents:
diff changeset
405 may contain uses from different nested regions. Consequently, IR transformations
anatofuz
parents:
diff changeset
406 and analyses can pull the instruction defining the value across region
anatofuz
parents:
diff changeset
407 boundaries, for example in case of TableGen-defined canonicalization patterns.
anatofuz
parents:
diff changeset
408 This would not be the case if all used values had been passed as region
anatofuz
parents:
diff changeset
409 arguments. One of the motivations for introducing regions in the IR is precisely
anatofuz
parents:
diff changeset
410 to enable cross-region analyses and transformations that are simpler than
anatofuz
parents:
diff changeset
411 inter-procedural transformations. Having uses from different regions appear in
anatofuz
parents:
diff changeset
412 the same use-def chain, contrary to an additional data structure maintaining
anatofuz
parents:
diff changeset
413 correspondence between function call arguments as uses of the original
anatofuz
parents:
diff changeset
414 definitions and formal arguments as new definitions, enables such
anatofuz
parents:
diff changeset
415 simplification. Since individual operations now belong to blocks, which belong
anatofuz
parents:
diff changeset
416 to regions, it is always possible to check if the definition of the value
anatofuz
parents:
diff changeset
417 belongs to the same region as its particular use. The risk is that any IR
anatofuz
parents:
diff changeset
418 traversal will need to handle explicitly this situation and it is easy to forget
anatofuz
parents:
diff changeset
419 a check (or conversely it isn’t easy to design the right check in a tablegen
anatofuz
parents:
diff changeset
420 pattern for example): traversing use-def chains potentially crosses implicitly
anatofuz
parents:
diff changeset
421 semantic barriers, making it possible to unknowingly break region semantics.
anatofuz
parents:
diff changeset
422 This is expected to be caught in the verifier after the transformation.
anatofuz
parents:
diff changeset
423
anatofuz
parents:
diff changeset
424 At the same time, one may choose to pass certain or all values as region
anatofuz
parents:
diff changeset
425 arguments to explicitly break the use-def chains in the current proposal. This
anatofuz
parents:
diff changeset
426 can be combined with an attribute-imposed semantic requirement disallowing the
anatofuz
parents:
diff changeset
427 body of the region to refer to any value from outside it.
anatofuz
parents:
diff changeset
428
anatofuz
parents:
diff changeset
429 ### Quantized integer operations
anatofuz
parents:
diff changeset
430
anatofuz
parents:
diff changeset
431 We haven't designed integer quantized operations in MLIR, but experience from
anatofuz
parents:
diff changeset
432 TensorFlow suggests that it is better to put information about the quantization
anatofuz
parents:
diff changeset
433 range/scale into the type itself, rather than have a single type like "qint8"
anatofuz
parents:
diff changeset
434 and put these on attributes of the operation.
anatofuz
parents:
diff changeset
435
anatofuz
parents:
diff changeset
436 There are a few ways to do this with MLIR, including at least:
anatofuz
parents:
diff changeset
437
anatofuz
parents:
diff changeset
438 * We could do the same thing TensorFlow does - and we will _have_ to support
anatofuz
parents:
diff changeset
439 that model to some extent for compatibility.
anatofuz
parents:
diff changeset
440 * We can encode the fp range of quantized integers directly into the types
anatofuz
parents:
diff changeset
441 when they are constants. The best practice on this seems to be to encode the
anatofuz
parents:
diff changeset
442 zero point as well as a scale factor. This ensures that 0.0 is always
anatofuz
parents:
diff changeset
443 exactly representable, e.g. `qi8<-1.42, 31.23x>`.
anatofuz
parents:
diff changeset
444 * We could theoretically encode dynamically determined ranges into the types
anatofuz
parents:
diff changeset
445 using something like `qi8<?,?>` with the bounds being determined through the
anatofuz
parents:
diff changeset
446 SSA dataflow graph dynamically - similar to how dynamic shapes are handled.
anatofuz
parents:
diff changeset
447
anatofuz
parents:
diff changeset
448 We will definitely need to do #1 for compatibility, we probably want to do #2,
anatofuz
parents:
diff changeset
449 and we should investigate #3 over time. That said, our short term plan is to get
anatofuz
parents:
diff changeset
450 more implementation experience with the rest of the system first, then come back
anatofuz
parents:
diff changeset
451 to re-examine the representation for quantized arithmetic when we have that
anatofuz
parents:
diff changeset
452 experience. When we do, we should chat with benoitjacob@ and
anatofuz
parents:
diff changeset
453 [read the paper](https://arxiv.org/abs/1712.05877).
anatofuz
parents:
diff changeset
454
anatofuz
parents:
diff changeset
455 ### Dialect type extensions
anatofuz
parents:
diff changeset
456
anatofuz
parents:
diff changeset
457 This section describes the design decisions that shaped the dialect extensible
anatofuz
parents:
diff changeset
458 type system present in MLIR.
anatofuz
parents:
diff changeset
459
anatofuz
parents:
diff changeset
460 #### Reserving dialect type kinds
anatofuz
parents:
diff changeset
461
anatofuz
parents:
diff changeset
462 Dialects that wish to define type extensions must reserve a range of type kinds
anatofuz
parents:
diff changeset
463 within a '.def' file within the core IR library. This means that every dialect
anatofuz
parents:
diff changeset
464 wishing to define custom types must modify this file, but it guarantees that all
anatofuz
parents:
diff changeset
465 type casting checkings are performed in O(1) time.
anatofuz
parents:
diff changeset
466
anatofuz
parents:
diff changeset
467 #### Interactions between dialects
anatofuz
parents:
diff changeset
468
anatofuz
parents:
diff changeset
469 There are two different interactions between dialects that are important to
anatofuz
parents:
diff changeset
470 understand. When types of a dialect are:
anatofuz
parents:
diff changeset
471
anatofuz
parents:
diff changeset
472 * In operations of other dialects
anatofuz
parents:
diff changeset
473
anatofuz
parents:
diff changeset
474 - For standard/builtin operations, only standard/builtin types are
anatofuz
parents:
diff changeset
475 allowed. This restriction allows for operations to clearly understand
anatofuz
parents:
diff changeset
476 the invariants that they are working under.
anatofuz
parents:
diff changeset
477 - Outside of standard/builtin operations, dialects are expected to verify
anatofuz
parents:
diff changeset
478 the allowable operation types per operation.
anatofuz
parents:
diff changeset
479
anatofuz
parents:
diff changeset
480 * In types of other dialects
anatofuz
parents:
diff changeset
481
anatofuz
parents:
diff changeset
482 - For standard/builtin types, these types are allowed to contain types
anatofuz
parents:
diff changeset
483 from other dialects. This simplifies the type system and removes the
anatofuz
parents:
diff changeset
484 need for dialects to redefine all of the standard aggregate types, e.g.
anatofuz
parents:
diff changeset
485 tensor, as well as the memref type. Dialects are expected to verify that
anatofuz
parents:
diff changeset
486 a specific type is valid within a standard type, e.g. if a type can be
anatofuz
parents:
diff changeset
487 an element of a tensor.
anatofuz
parents:
diff changeset
488 - For dialect types, the dialect is expected to verify any type
anatofuz
parents:
diff changeset
489 invariants, e.g. if the standard tensor type can contain a specific type
anatofuz
parents:
diff changeset
490 of that dialect.
anatofuz
parents:
diff changeset
491
anatofuz
parents:
diff changeset
492 #### Separating builtin and standard types
anatofuz
parents:
diff changeset
493
anatofuz
parents:
diff changeset
494 Following the separation between the built-in and standard dialect, it makes
anatofuz
parents:
diff changeset
495 sense to separate built-in types and standard dialect types. Built-in types are
anatofuz
parents:
diff changeset
496 required for the validity of the IR itself, e.g. the function type (which
anatofuz
parents:
diff changeset
497 appears in function signatures and generic assembly forms of operations).
anatofuz
parents:
diff changeset
498 Integer, float, vector, memref and tensor types, while important, are not
anatofuz
parents:
diff changeset
499 necessary for IR validity.
anatofuz
parents:
diff changeset
500
anatofuz
parents:
diff changeset
501 #### Unregistered types
anatofuz
parents:
diff changeset
502
anatofuz
parents:
diff changeset
503 MLIR supports unregistered operations in generic assembly form. MLIR also
anatofuz
parents:
diff changeset
504 supports a similar concept for types. When parsing, if the dialect for dialect
anatofuz
parents:
diff changeset
505 type has not been registered the type is modeled as an 'OpaqueType'. This allows
anatofuz
parents:
diff changeset
506 for types to be round-tripped without needing to link in the dialect library
anatofuz
parents:
diff changeset
507 that defined them. No additional information about opaque types, outside of
anatofuz
parents:
diff changeset
508 parsing/printing, will be available.
anatofuz
parents:
diff changeset
509
anatofuz
parents:
diff changeset
510 #### Dialect type syntax
anatofuz
parents:
diff changeset
511
anatofuz
parents:
diff changeset
512 Dialect extended types are represented as string literals wrapped inside of the
anatofuz
parents:
diff changeset
513 dialect namespace. This means that the parser delegates to the dialect for
anatofuz
parents:
diff changeset
514 parsing specific type instances. This differs from the representation of dialect
anatofuz
parents:
diff changeset
515 defined operations, of which have an identifier name that the parser uses to
anatofuz
parents:
diff changeset
516 identify and parse them.
anatofuz
parents:
diff changeset
517
anatofuz
parents:
diff changeset
518 This representation was chosen for several reasons:
anatofuz
parents:
diff changeset
519
anatofuz
parents:
diff changeset
520 ##### Dialects must provide custom type parsers
anatofuz
parents:
diff changeset
521
anatofuz
parents:
diff changeset
522 Dialect type parsing cannot plug into the existing parser infrastructure as
anatofuz
parents:
diff changeset
523 operations do with the OpAsmParser/Printer. Operations have a defined syntax
anatofuz
parents:
diff changeset
524 structure that is the same across all dialects. Types, on the other hand, may
anatofuz
parents:
diff changeset
525 have many different, and sometimes conflicting, parsing constraints that would
anatofuz
parents:
diff changeset
526 be difficult/unmaintainable to provide within a single interface.
anatofuz
parents:
diff changeset
527
anatofuz
parents:
diff changeset
528 This also has the added benefit of encouraging dialects to reuse existing
anatofuz
parents:
diff changeset
529 external type parsers. For example, an LLVM dialect may provide an MLIR LLVM
anatofuz
parents:
diff changeset
530 type that is simply a wrapper around LLVM types. The LLVM dialect would then use
anatofuz
parents:
diff changeset
531 the existing LLVM type parsing infrastructure.
anatofuz
parents:
diff changeset
532
anatofuz
parents:
diff changeset
533 Example:
anatofuz
parents:
diff changeset
534
anatofuz
parents:
diff changeset
535 ```mlir
anatofuz
parents:
diff changeset
536 %s = "foo"() : () -> !llvm<"i32*">
anatofuz
parents:
diff changeset
537 ```
anatofuz
parents:
diff changeset
538
anatofuz
parents:
diff changeset
539 ##### Types do not always have canonical names
anatofuz
parents:
diff changeset
540
anatofuz
parents:
diff changeset
541 Unlike operations, types generally do not have a formal canonical name. For
anatofuz
parents:
diff changeset
542 example, function types have no defined keyword and integer types are defined by
anatofuz
parents:
diff changeset
543 a regular expression to support arbitrary bitwidth. Dialects with existing type
anatofuz
parents:
diff changeset
544 systems, e.g. LLVM, are likely to provide wrappers around their existing type
anatofuz
parents:
diff changeset
545 systems. For these wrapper types there is no simple canonical name, it's logical
anatofuz
parents:
diff changeset
546 to think of these types as existing within the namespace of the dialect. If a
anatofuz
parents:
diff changeset
547 dialect wishes to assign a canonical name to a type, it can be done via
anatofuz
parents:
diff changeset
548 [type aliases](LangRef.md#type-aliases).
anatofuz
parents:
diff changeset
549
anatofuz
parents:
diff changeset
550 ### Tuple types
anatofuz
parents:
diff changeset
551
anatofuz
parents:
diff changeset
552 The MLIR type system provides first class support for defining
anatofuz
parents:
diff changeset
553 [tuple types](LangRef.md#tuple-type). This is due to the fact that `Tuple`
anatofuz
parents:
diff changeset
554 represents a universal concept that is likely to, and has already begun to,
anatofuz
parents:
diff changeset
555 present itself in many different dialects. Though this type is first class in
anatofuz
parents:
diff changeset
556 the type system, it merely serves to provide a common mechanism in which to
anatofuz
parents:
diff changeset
557 represent this concept in MLIR. As such, MLIR provides no standard operations
anatofuz
parents:
diff changeset
558 for interfacing with `tuple` types. It is up to dialect authors to provide
anatofuz
parents:
diff changeset
559 operations, e.g. extract_tuple_element, to interpret and manipulate them. When
anatofuz
parents:
diff changeset
560 possible, operations should prefer to use multiple results instead. These
anatofuz
parents:
diff changeset
561 provide a myriad of benefits, such as alleviating any need for tuple-extract
anatofuz
parents:
diff changeset
562 operations that merely get in the way of analysis and transformation.
anatofuz
parents:
diff changeset
563
anatofuz
parents:
diff changeset
564 ### Assembly forms
anatofuz
parents:
diff changeset
565
anatofuz
parents:
diff changeset
566 MLIR decides to support both generic and custom assembly forms under the
anatofuz
parents:
diff changeset
567 following considerations:
anatofuz
parents:
diff changeset
568
anatofuz
parents:
diff changeset
569 MLIR is an open system; it is designed to support modular and pluggable
anatofuz
parents:
diff changeset
570 dialects. Depending on whether there exists a corresponding dialect and whether
anatofuz
parents:
diff changeset
571 the dialect is plugged in, operations may or may not be registered into MLIR
anatofuz
parents:
diff changeset
572 system. Yet we still need a way to investigate these operations. So the generic
anatofuz
parents:
diff changeset
573 assembly form is mandated by this aspect of MLIR system. It provides a default
anatofuz
parents:
diff changeset
574 textual form for operations.
anatofuz
parents:
diff changeset
575
anatofuz
parents:
diff changeset
576 On the other hand, an assembly form is for assisting developers to investigate
anatofuz
parents:
diff changeset
577 the IR. The generic form serves as a safe fallback but it can be too verbose for
anatofuz
parents:
diff changeset
578 certain ops. Therefore, MLIR gives each dialect the choice to define a custom
anatofuz
parents:
diff changeset
579 assembly form for each operation according to the operation's semantics and
anatofuz
parents:
diff changeset
580 specific needs. The custom assembly form can de-duplicate information from the
anatofuz
parents:
diff changeset
581 operation to derive a more concise form, thus better facilitating the
anatofuz
parents:
diff changeset
582 comprehension of the IR.
anatofuz
parents:
diff changeset
583
anatofuz
parents:
diff changeset
584 ## Examples
anatofuz
parents:
diff changeset
585
anatofuz
parents:
diff changeset
586 This section describes a few very simple examples that help understand how MLIR
anatofuz
parents:
diff changeset
587 represents computation.
anatofuz
parents:
diff changeset
588
anatofuz
parents:
diff changeset
589 ### Non-affine control flow
anatofuz
parents:
diff changeset
590
anatofuz
parents:
diff changeset
591 ```mlir
anatofuz
parents:
diff changeset
592 // A simple linear search in every row of a matrix
anatofuz
parents:
diff changeset
593 for (i = 0; i < N; i++) {
anatofuz
parents:
diff changeset
594 for (j = 0; j < N; j++) {
anatofuz
parents:
diff changeset
595 // dynamic control flow
anatofuz
parents:
diff changeset
596 if (a[i][j] == key) {
anatofuz
parents:
diff changeset
597 s[i] = j;
anatofuz
parents:
diff changeset
598 break;
anatofuz
parents:
diff changeset
599 }
anatofuz
parents:
diff changeset
600 }
anatofuz
parents:
diff changeset
601 }
anatofuz
parents:
diff changeset
602 ```
anatofuz
parents:
diff changeset
603
anatofuz
parents:
diff changeset
604 The presence of dynamic control flow leads to an inner non-affine function
anatofuz
parents:
diff changeset
605 nested in an outer function that using affine loops.
anatofuz
parents:
diff changeset
606
anatofuz
parents:
diff changeset
607 ```mlir
anatofuz
parents:
diff changeset
608 func @search(%A: memref<?x?xi32, %S: <?xi32>, %key : i32) {
anatofuz
parents:
diff changeset
609 %ni = dim %A, 0 : memref<?x?xi32>
anatofuz
parents:
diff changeset
610 // This loop can be parallelized
anatofuz
parents:
diff changeset
611 affine.for %i = 0 to %ni {
anatofuz
parents:
diff changeset
612 call @search_body (%A, %S, %key, %i) : (memref<?x?xi32>, memref<?xi32>, i32, i32)
anatofuz
parents:
diff changeset
613 }
anatofuz
parents:
diff changeset
614 return
anatofuz
parents:
diff changeset
615 }
anatofuz
parents:
diff changeset
616
anatofuz
parents:
diff changeset
617 func @search_body(%A: memref<?x?xi32>, %S: memref<?xi32>, %key: i32, %i : i32) {
anatofuz
parents:
diff changeset
618 %nj = dim %A, 1 : memref<?x?xi32>
anatofuz
parents:
diff changeset
619 br ^bb1(0)
anatofuz
parents:
diff changeset
620
anatofuz
parents:
diff changeset
621 ^bb1(%j: i32)
anatofuz
parents:
diff changeset
622 %p1 = cmpi "lt", %j, %nj : i32
anatofuz
parents:
diff changeset
623 cond_br %p1, ^bb2, ^bb5
anatofuz
parents:
diff changeset
624
anatofuz
parents:
diff changeset
625 ^bb2:
anatofuz
parents:
diff changeset
626 %v = affine.load %A[%i, %j] : memref<?x?xi32>
anatofuz
parents:
diff changeset
627 %p2 = cmpi "eq", %v, %key : i32
anatofuz
parents:
diff changeset
628 cond_br %p2, ^bb3(%j), ^bb4
anatofuz
parents:
diff changeset
629
anatofuz
parents:
diff changeset
630 ^bb3(%j: i32)
anatofuz
parents:
diff changeset
631 affine.store %j, %S[%i] : memref<?xi32>
anatofuz
parents:
diff changeset
632 br ^bb5
anatofuz
parents:
diff changeset
633
anatofuz
parents:
diff changeset
634 ^bb4:
anatofuz
parents:
diff changeset
635 %jinc = addi %j, 1 : i32
anatofuz
parents:
diff changeset
636 br ^bb1(%jinc)
anatofuz
parents:
diff changeset
637
anatofuz
parents:
diff changeset
638 ^bb5:
anatofuz
parents:
diff changeset
639 return
anatofuz
parents:
diff changeset
640 }
anatofuz
parents:
diff changeset
641 ```
anatofuz
parents:
diff changeset
642
anatofuz
parents:
diff changeset
643 As per the [MLIR spec](LangRef.md), the restrictions on dimensions and symbol
anatofuz
parents:
diff changeset
644 identifiers to be used with the affine.apply operation only apply to accesses
anatofuz
parents:
diff changeset
645 inside `affine.for` and `affine.if` operations. However, an analysis of accesses
anatofuz
parents:
diff changeset
646 inside the called function (`@search_body`) is necessary to determine if the
anatofuz
parents:
diff changeset
647 `%i` loop could be parallelized: such function access analysis is calling
anatofuz
parents:
diff changeset
648 context sensitive.
anatofuz
parents:
diff changeset
649
anatofuz
parents:
diff changeset
650 ### Non-affine loop bounds
anatofuz
parents:
diff changeset
651
anatofuz
parents:
diff changeset
652 Loop bounds that are not affine lead to a nesting of functions as shown below.
anatofuz
parents:
diff changeset
653
anatofuz
parents:
diff changeset
654 ```c
anatofuz
parents:
diff changeset
655 for (i = 0; i < N; i++)
anatofuz
parents:
diff changeset
656   for (j = 0; j < N; j++)
anatofuz
parents:
diff changeset
657 // Non-affine loop bound for k loop.
anatofuz
parents:
diff changeset
658     for (k = 0; k < pow(2, j); k++)
anatofuz
parents:
diff changeset
659        for (l = 0; l < N; l++) {
anatofuz
parents:
diff changeset
660         // block loop body
anatofuz
parents:
diff changeset
661         ...
anatofuz
parents:
diff changeset
662        }
anatofuz
parents:
diff changeset
663 ```
anatofuz
parents:
diff changeset
664
anatofuz
parents:
diff changeset
665 ```mlir
anatofuz
parents:
diff changeset
666 func @outer_nest(%n : index) {
anatofuz
parents:
diff changeset
667 affine.for %i = 0 to %n {
anatofuz
parents:
diff changeset
668 affine.for %j = 0 to %n {
anatofuz
parents:
diff changeset
669 %pow = call @pow(2, %j) : (index, index) -> index
anatofuz
parents:
diff changeset
670 call @inner_nest(%pow, %n) : ...
anatofuz
parents:
diff changeset
671 }
anatofuz
parents:
diff changeset
672 }
anatofuz
parents:
diff changeset
673 return
anatofuz
parents:
diff changeset
674 }
anatofuz
parents:
diff changeset
675
anatofuz
parents:
diff changeset
676 func @inner_nest(%m : index, %n : index) {
anatofuz
parents:
diff changeset
677 affine.for %k = 0 to %m {
anatofuz
parents:
diff changeset
678 affine.for %l = 0 to %n {
anatofuz
parents:
diff changeset
679 ...
anatofuz
parents:
diff changeset
680 }
anatofuz
parents:
diff changeset
681 }
anatofuz
parents:
diff changeset
682 return
anatofuz
parents:
diff changeset
683 }
anatofuz
parents:
diff changeset
684 ```
anatofuz
parents:
diff changeset
685
anatofuz
parents:
diff changeset
686 ### Reference 2D Convolution
anatofuz
parents:
diff changeset
687
anatofuz
parents:
diff changeset
688 The following example illustrates a reference implementation of a 2D
anatofuz
parents:
diff changeset
689 convolution, which uses an integer set `#domain` to represent valid input data
anatofuz
parents:
diff changeset
690 in a dilated convolution.
anatofuz
parents:
diff changeset
691
anatofuz
parents:
diff changeset
692 ```mlir
anatofuz
parents:
diff changeset
693 // Dilation factors S0 and S1 can be constant folded if constant at compile time.
anatofuz
parents:
diff changeset
694 #domain = (d0, d1)[S0,S1,S2,S3]: (d0 % S0 == 0, d1 % S1 == 0, d0 >= 0, d1 >= 0,
anatofuz
parents:
diff changeset
695 S3 - d0 - 1 >= 0, S4 - d1 - 1 >= 0)
anatofuz
parents:
diff changeset
696 // Identity map (shown here for illustration).
anatofuz
parents:
diff changeset
697 #map0 = (d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3, d4, d5, d6)
anatofuz
parents:
diff changeset
698
anatofuz
parents:
diff changeset
699 // Affine map from output to input coordinate space.
anatofuz
parents:
diff changeset
700 // d0 = output_h, d1 = output_w, d2 = kernel_h, d3 = kernel_w
anatofuz
parents:
diff changeset
701 // S0 = h_stride, S1 = w_stride, S2 = h_kernel_dilation, S3 = w_kernel_dilation
anatofuz
parents:
diff changeset
702 // S4 = h_pad_low, S5 = w_pad_low
anatofuz
parents:
diff changeset
703 // %out0 = %0#1 * %h_stride + %0#4 * %h_kernel_dilation - %h_pad_low
anatofuz
parents:
diff changeset
704 // %out1= %0#2 * %w_stride + %0#5 * %w_kernel_dilation - %w_pad_low
anatofuz
parents:
diff changeset
705 #map1_0 = (d0, d1, d2, d3) [S0, S1, S2, S3, S4, S5] -> (d0 * S0 + d2 * S2 - %S4)
anatofuz
parents:
diff changeset
706 #map1_1 = (d0, d1, d2, d3) [S0, S1, S2, S3, S4, S5] -> (d1 * S1 + d3 * S3 - %S5)
anatofuz
parents:
diff changeset
707
anatofuz
parents:
diff changeset
708 // Semi-affine map to undilated input coordinate space.
anatofuz
parents:
diff changeset
709 // d0 = input_h, d1 = input_w, S0 = h_base_dilation, S1 = w_base_dilation.
anatofuz
parents:
diff changeset
710 #map2_0 = (d0, d1) [S0, S1] -> (d0 / S0)
anatofuz
parents:
diff changeset
711 #map2_1 = (d0, d1) [S0, S1] -> (d1 / S1)
anatofuz
parents:
diff changeset
712
anatofuz
parents:
diff changeset
713 // Conv2D shapes:
anatofuz
parents:
diff changeset
714 // input: [batch, input_height, input_width, input_feature]
anatofuz
parents:
diff changeset
715 // kernel: [kernel_height, kernel_width, input_feature, output_feature]
anatofuz
parents:
diff changeset
716 // output: [batch, output_height, output_width, output_feature]
anatofuz
parents:
diff changeset
717 func @conv2d(%input: memref<16x1024x1024x3xf32, #lm0, /*scratchpad=*/1>,
anatofuz
parents:
diff changeset
718 %kernel: memref<5x5x3x32xf32, #lm0, /*scratchpad=*/1>,
anatofuz
parents:
diff changeset
719 %output: memref<16x512x512x32xf32, #lm0, /*scratchpad=*/1>) {
anatofuz
parents:
diff changeset
720 affine.for %b = 0 to %batch {
anatofuz
parents:
diff changeset
721 affine.for %oh = 0 to %output_height {
anatofuz
parents:
diff changeset
722 affine.for %ow = 0 to %output_width {
anatofuz
parents:
diff changeset
723 affine.for %of = 0 to %output_feature {
anatofuz
parents:
diff changeset
724 affine.for %kh = 0 to %kernel_height {
anatofuz
parents:
diff changeset
725 affine.for %kw = 0 to %kernel_width {
anatofuz
parents:
diff changeset
726 affine.for %if = 0 to %input_feature {
anatofuz
parents:
diff changeset
727 // Calculate input indices.
anatofuz
parents:
diff changeset
728 %1_0 = affine.apply #map1_0 (%0#1, %0#2, %0#4, %0#5)
anatofuz
parents:
diff changeset
729 [%h_stride, %w_stride, %h_kernel_dilation, %w_kernel_dilation,
anatofuz
parents:
diff changeset
730 %h_pad_low, %w_pad_low]
anatofuz
parents:
diff changeset
731 %1_1 = affine.apply #map1_1 (%0#1, %0#2, %0#4, %0#5)
anatofuz
parents:
diff changeset
732 [%h_stride, %w_stride, %h_kernel_dilation, %w_kernel_dilation,
anatofuz
parents:
diff changeset
733 %h_pad_low, %w_pad_low]
anatofuz
parents:
diff changeset
734
anatofuz
parents:
diff changeset
735 // Check if access is not in padding.
anatofuz
parents:
diff changeset
736 affine.if #domain(%1_0, %1_1)
anatofuz
parents:
diff changeset
737 [%h_base_dilation, %w_kernel_dilation, %h_bound, %w_bound] {
anatofuz
parents:
diff changeset
738 %2_0 = affine.apply #map2 (%1_0, %1_1)
anatofuz
parents:
diff changeset
739 %2_1 = affine.apply #map2 (%1_0, %1_1)
anatofuz
parents:
diff changeset
740 // Compute: output[output_indices] += input[input_indices] * kernel[kernel_indices]
anatofuz
parents:
diff changeset
741 call @multiply_accumulate(%input, %kernel, %output, %b, %oh, %ow, %of, %kh, %kw, %if, %2_0, %2_1)
anatofuz
parents:
diff changeset
742 }
anatofuz
parents:
diff changeset
743 }
anatofuz
parents:
diff changeset
744 }
anatofuz
parents:
diff changeset
745 }
anatofuz
parents:
diff changeset
746 }
anatofuz
parents:
diff changeset
747 }
anatofuz
parents:
diff changeset
748 }
anatofuz
parents:
diff changeset
749 }
anatofuz
parents:
diff changeset
750 return
anatofuz
parents:
diff changeset
751 }
anatofuz
parents:
diff changeset
752 ```
anatofuz
parents:
diff changeset
753
anatofuz
parents:
diff changeset
754 TODO (Add more examples showing the IR for a variety of interesting cases)
anatofuz
parents:
diff changeset
755
anatofuz
parents:
diff changeset
756 ## Design alternatives and extensions
anatofuz
parents:
diff changeset
757
anatofuz
parents:
diff changeset
758 This is a list of some design alternatives and extensions that we discussed in
anatofuz
parents:
diff changeset
759 detail but did not include in the spec or postponed them for future
anatofuz
parents:
diff changeset
760 consideration on demand. We will revisit these discussions when we have more
anatofuz
parents:
diff changeset
761 implementation experience and learn more about the challenges and limitations of
anatofuz
parents:
diff changeset
762 our current design in practice.
anatofuz
parents:
diff changeset
763
anatofuz
parents:
diff changeset
764 ### Polyhedral code representation alternatives: schedule lists vs schedules trees vs affine loop/if forms
anatofuz
parents:
diff changeset
765
anatofuz
parents:
diff changeset
766 The current MLIR uses a representation of polyhedral schedules using a tree of
anatofuz
parents:
diff changeset
767 if/for loops. We extensively debated the tradeoffs involved in the typical
anatofuz
parents:
diff changeset
768 unordered polyhedral instruction representation (where each instruction has
anatofuz
parents:
diff changeset
769 multidimensional schedule information), discussed the benefits of schedule tree
anatofuz
parents:
diff changeset
770 forms, and eventually decided to go with a syntactic tree of affine if/else
anatofuz
parents:
diff changeset
771 conditionals and affine for loops. Discussion of the tradeoff was captured in
anatofuz
parents:
diff changeset
772 this document:
anatofuz
parents:
diff changeset
773 [ MLIR: The case for a simplified polyhedral form](RationaleSimplifiedPolyhedralForm.md).
anatofuz
parents:
diff changeset
774
anatofuz
parents:
diff changeset
775 At a high level, we have two alternatives here:
anatofuz
parents:
diff changeset
776
anatofuz
parents:
diff changeset
777 1. Schedule tree representation instead of an affine loop AST form: The current
anatofuz
parents:
diff changeset
778 proposal uses an affine loop and conditional tree form, which is syntactic
anatofuz
parents:
diff changeset
779 and with no separation of domains as sets and schedules as multidimensional
anatofuz
parents:
diff changeset
780 affine functions. A schedule tree form however makes polyhedral domains and
anatofuz
parents:
diff changeset
781 schedules a first class concept in the IR allowing compact expression of
anatofuz
parents:
diff changeset
782 transformations through the schedule tree without changing the domains of
anatofuz
parents:
diff changeset
783 instructions. Such a representation also hides prologues, epilogues, partial
anatofuz
parents:
diff changeset
784 tiles, complex loop bounds and conditionals making loop nests free of
anatofuz
parents:
diff changeset
785 "syntax". Cost models instead look at domains and schedules. In addition, if
anatofuz
parents:
diff changeset
786 necessary such a domain schedule representation can be normalized to
anatofuz
parents:
diff changeset
787 explicitly propagate the schedule into domains and model all the cleanup
anatofuz
parents:
diff changeset
788 code. An example and more detail on the schedule tree form is in the next
anatofuz
parents:
diff changeset
789 section.
anatofuz
parents:
diff changeset
790 1. Having two different forms of "affine regions": an affine loop tree form
anatofuz
parents:
diff changeset
791 and a polyhedral schedule tree form. In the latter, ops could carry
anatofuz
parents:
diff changeset
792 attributes capturing domain, scheduling, and other polyhedral code
anatofuz
parents:
diff changeset
793 generation options with IntegerSet, AffineMap, and other attributes.
anatofuz
parents:
diff changeset
794
anatofuz
parents:
diff changeset
795 #### Schedule Tree Representation for Affine Regions
anatofuz
parents:
diff changeset
796
anatofuz
parents:
diff changeset
797 This representation is based on a simplified form of the domain/schedule
anatofuz
parents:
diff changeset
798 representation used by the polyhedral compiler community. Domains represent what
anatofuz
parents:
diff changeset
799 has to be executed while schedules represent the order in which domain elements
anatofuz
parents:
diff changeset
800 are interleaved. We model domains as non-piece-wise convex integer sets, and
anatofuz
parents:
diff changeset
801 schedules as affine functions; however, the former can be disjunctive, and the
anatofuz
parents:
diff changeset
802 latter can be piece-wise affine relations. In the schedule tree representation,
anatofuz
parents:
diff changeset
803 domain and schedules for instructions are represented in a tree-like structure
anatofuz
parents:
diff changeset
804 which is called a schedule tree. Each non-leaf node of the tree is an abstract
anatofuz
parents:
diff changeset
805 polyhedral dimension corresponding to an abstract fused loop for each ML
anatofuz
parents:
diff changeset
806 instruction that appears in that branch. Each leaf node is an ML Instruction.
anatofuz
parents:
diff changeset
807
anatofuz
parents:
diff changeset
808 ```mlir
anatofuz
parents:
diff changeset
809 // A tiled matmul code (128x128x128) represented in schedule tree form
anatofuz
parents:
diff changeset
810
anatofuz
parents:
diff changeset
811 // #map0 = (d0, d1, d2, d3, d4, d5) -> (128*d0 + d3, 128*d1 + d4, 128*d2 + d5)
anatofuz
parents:
diff changeset
812 #intset_ij = (i, j) [M, N, K] : i >= 0, -i + N - 1 >= 0, j >= 0, -j + N-1 >= 0
anatofuz
parents:
diff changeset
813 #intset_ijk = (i, j, k) [M, N, K] : i >= 0, -i + N - 1 >= 0, j >= 0,
anatofuz
parents:
diff changeset
814 -j + M-1 >= 0, k >= 0, -k + N - 1 >= 0)
anatofuz
parents:
diff changeset
815 func @matmul(%A, %B, %C, %M, %N, %K) : (...) { // %M, N, K are symbols
anatofuz
parents:
diff changeset
816 // t1, t2, t3, t4, t5, t6 are abstract polyhedral loops
anatofuz
parents:
diff changeset
817 mldim %t1 : {S1,S2,S3,S4,S5} floordiv (i, 128) {
anatofuz
parents:
diff changeset
818 mldim %t2 : {S1,S2,S3,S4,S5} floordiv (j, 128) {
anatofuz
parents:
diff changeset
819 // (%i, %j) = affine.apply (d0, d1) -> (128*d0, 128*d1) (%t1, %t2)
anatofuz
parents:
diff changeset
820 call dma_mem_to_scratchpad(%C, %i, %j, %M, %N, %K)
anatofuz
parents:
diff changeset
821 with @intset_ij(%i, %j) [%M, %N, %K]
anatofuz
parents:
diff changeset
822 mldim %t3 : {S2,S3,S4,S5} floordiv (k, 128) {
anatofuz
parents:
diff changeset
823 // (%i, %j, %k) = affine.apply (d0, d1, d2)
anatofuz
parents:
diff changeset
824 // -> (128*d0, 128*d1, 128*d2) (%t1, %t2, %t3)
anatofuz
parents:
diff changeset
825 call dma_mem_to_scratchpad(%A, ...) with #inset_ijk (%i, %j, %k) [%M, %N, %K]
anatofuz
parents:
diff changeset
826 // (%i, %j, %k) = affine.apply (d0, d1, d2)
anatofuz
parents:
diff changeset
827 // -> (128*d0, 128*d1, 128*d2) (%t1, %t2, %t3)
anatofuz
parents:
diff changeset
828 call dma_mem_to_scratchpad(%B, ...) with #inset_ijk (%i, %j, %k) [%M, %N, %K]
anatofuz
parents:
diff changeset
829 mldim %t4 : {S4} i mod 128 {
anatofuz
parents:
diff changeset
830 mldim %t5 : {S4} j mod 128 {
anatofuz
parents:
diff changeset
831 mldim %t6 : {S4} k mod 128 {
anatofuz
parents:
diff changeset
832 // (%i, %j, %k) = affine.apply #map0 (%t1, %t2, %t3, %t4, %t5, %t6)
anatofuz
parents:
diff changeset
833 call matmul_body(A, B, C, %i, %j, %k, %M, %N, %K)
anatofuz
parents:
diff changeset
834 with #inset_ijk(%i, %j, %k) [%M, %N, %K]
anatofuz
parents:
diff changeset
835 } // end mld4im t6
anatofuz
parents:
diff changeset
836 } // end mldim t5
anatofuz
parents:
diff changeset
837 } // end mldim t4
anatofuz
parents:
diff changeset
838 } // end mldim t3
anatofuz
parents:
diff changeset
839 // (%i, %j) = affine.apply (d0, d1) -> (128*d0, 128*d1) (%t1, %t2)
anatofuz
parents:
diff changeset
840 call $dma_scratchpad_to_mem_C ... with #intset(%i, %j) [%M, %N, %K]
anatofuz
parents:
diff changeset
841 } // end mldim t2
anatofuz
parents:
diff changeset
842 } // end mldim t1
anatofuz
parents:
diff changeset
843 return
anatofuz
parents:
diff changeset
844 }
anatofuz
parents:
diff changeset
845
anatofuz
parents:
diff changeset
846 ```
anatofuz
parents:
diff changeset
847
anatofuz
parents:
diff changeset
848 ### Affine Relations
anatofuz
parents:
diff changeset
849
anatofuz
parents:
diff changeset
850 The current MLIR spec includes affine maps and integer sets, but not affine
anatofuz
parents:
diff changeset
851 relations. Affine relations are a natural way to model read and write access
anatofuz
parents:
diff changeset
852 information, which can be very useful to capture the behavior of opaque external
anatofuz
parents:
diff changeset
853 library calls, high-performance vendor libraries, or user-provided / user-tuned
anatofuz
parents:
diff changeset
854 routines.
anatofuz
parents:
diff changeset
855
anatofuz
parents:
diff changeset
856 An affine relation is a relation between input and output dimension identifiers
anatofuz
parents:
diff changeset
857 while being symbolic on a list of symbolic identifiers and with affine
anatofuz
parents:
diff changeset
858 constraints on the identifiers.
anatofuz
parents:
diff changeset
859
anatofuz
parents:
diff changeset
860 Syntax:
anatofuz
parents:
diff changeset
861
anatofuz
parents:
diff changeset
862 ```
anatofuz
parents:
diff changeset
863 // Affine relation definition at the top of file
anatofuz
parents:
diff changeset
864 affine-rel-def ::= affine-rel-id `=` affine-relation-inline
anatofuz
parents:
diff changeset
865
anatofuz
parents:
diff changeset
866 affine-rel-id ::= `##` prefixed-id
anatofuz
parents:
diff changeset
867
anatofuz
parents:
diff changeset
868 affine-relation-inline ::=
anatofuz
parents:
diff changeset
869 `(` input-dims `)` (`[` symbols `]`)? `->`
anatofuz
parents:
diff changeset
870 `(` output-dims `)` : affine-constraint-conjunction
anatofuz
parents:
diff changeset
871
anatofuz
parents:
diff changeset
872 input-dims ::= bare-id-list
anatofuz
parents:
diff changeset
873 output-dims ::= bare-id-list
anatofuz
parents:
diff changeset
874 symbols ::= bare-id-list
anatofuz
parents:
diff changeset
875
anatofuz
parents:
diff changeset
876 affine-rel ::= affine-rel-id | affine-relation-inline
anatofuz
parents:
diff changeset
877
anatofuz
parents:
diff changeset
878 // Usage
anatofuz
parents:
diff changeset
879 affine-rel-spec ::= affine-rel dim-and-symbol-use-list
anatofuz
parents:
diff changeset
880 ```
anatofuz
parents:
diff changeset
881
anatofuz
parents:
diff changeset
882 All identifiers appearing in input-dims, output-dims, and symbol-dims are
anatofuz
parents:
diff changeset
883 pairwise distinct. All affine-constraint non-terminals in the above syntax are
anatofuz
parents:
diff changeset
884 allowed to contain identifiers only from input-dims, output-dims, and
anatofuz
parents:
diff changeset
885 symbol-dims.
anatofuz
parents:
diff changeset
886
anatofuz
parents:
diff changeset
887 Affine relations are used to model read, write, may_read, and may_write sets of
anatofuz
parents:
diff changeset
888 functions in the IR. The output dimension identifiers correspond to the data
anatofuz
parents:
diff changeset
889 dimensions.
anatofuz
parents:
diff changeset
890
anatofuz
parents:
diff changeset
891 Example:
anatofuz
parents:
diff changeset
892
anatofuz
parents:
diff changeset
893 ```mlir
anatofuz
parents:
diff changeset
894 // read relation: two elements ( d0 <= r0 <= d0+1 )
anatofuz
parents:
diff changeset
895 ##aff_rel9 = (d0) -> (r0) : r0 - d0 >= 0, d0 - r0 + 1 >= 0
anatofuz
parents:
diff changeset
896
anatofuz
parents:
diff changeset
897 func @count (%A : memref<128xf32>, %pos : i32) -> f32
anatofuz
parents:
diff changeset
898 reads: {%A ##aff_rel9 (%pos)}
anatofuz
parents:
diff changeset
899 writes: /* empty */
anatofuz
parents:
diff changeset
900 may_reads: /* empty */
anatofuz
parents:
diff changeset
901 may_writes: /* empty */ {
anatofuz
parents:
diff changeset
902 bb0 (%0, %1: memref<128xf32>, i64):
anatofuz
parents:
diff changeset
903 %val = affine.load %A [%pos]
anatofuz
parents:
diff changeset
904 %val = affine.load %A [%pos + 1]
anatofuz
parents:
diff changeset
905 %p = mulf %val, %val : f32
anatofuz
parents:
diff changeset
906 return %p : f32
anatofuz
parents:
diff changeset
907 }
anatofuz
parents:
diff changeset
908 ```
anatofuz
parents:
diff changeset
909
anatofuz
parents:
diff changeset
910 ### Regions
anatofuz
parents:
diff changeset
911
anatofuz
parents:
diff changeset
912 #### Making function definition an operation
anatofuz
parents:
diff changeset
913
anatofuz
parents:
diff changeset
914 MLIR supports values of a Function type. Instead of having first-class IR
anatofuz
parents:
diff changeset
915 concept for functions, one could define an operation with a body region that
anatofuz
parents:
diff changeset
916 defines a function value. The particularity of functions is that their names are
anatofuz
parents:
diff changeset
917 globally visible and can be referred to before being defined, unlike SSA values
anatofuz
parents:
diff changeset
918 that must be defined first. Implementing a "function definition" operation would
anatofuz
parents:
diff changeset
919 require to relax some of the SSA constraints in a region, and also make the IR
anatofuz
parents:
diff changeset
920 Module a region as well. It would also affect the core infrastructure (e.g.,
anatofuz
parents:
diff changeset
921 function passes) only for the sake of concept unification.
anatofuz
parents:
diff changeset
922
anatofuz
parents:
diff changeset
923 #### Having types on a region
anatofuz
parents:
diff changeset
924
anatofuz
parents:
diff changeset
925 Instead of inspecting the types of arguments of the first block, one could give
anatofuz
parents:
diff changeset
926 the region itself a type. This type would be redundant with block argument
anatofuz
parents:
diff changeset
927 types, which must have values and create room for type mismatches. While
anatofuz
parents:
diff changeset
928 functions do have types that are partly redundant with the arguments of the
anatofuz
parents:
diff changeset
929 first block in the function, this is necessary to support function declarations
anatofuz
parents:
diff changeset
930 that do not have a body which we can refer to in order to obtain the argument
anatofuz
parents:
diff changeset
931 types. A region is always contained in an operation or a function that can be
anatofuz
parents:
diff changeset
932 queried to obtain the “type” of the region if necessary.
anatofuz
parents:
diff changeset
933
anatofuz
parents:
diff changeset
934 A type on a region can be justified if Regions were to be considered separately
anatofuz
parents:
diff changeset
935 from the enclosing entity (operation or function) and had their own semantics
anatofuz
parents:
diff changeset
936 that should be checked.
anatofuz
parents:
diff changeset
937
anatofuz
parents:
diff changeset
938 #### Attaching attributes to regions
anatofuz
parents:
diff changeset
939
anatofuz
parents:
diff changeset
940 Regions could be annotated with dialect attributes to use attribute verification
anatofuz
parents:
diff changeset
941 hooks. An operation could take multiple regions as arguments, and each of them
anatofuz
parents:
diff changeset
942 may require different attributes. However, there are currently very few
anatofuz
parents:
diff changeset
943 practical cases where this would be necessary. Instead, one could simulate
anatofuz
parents:
diff changeset
944 per-region attributes with array attributes attached to the entity containing
anatofuz
parents:
diff changeset
945 the region (operation or function). This decreases the overall complexity of the
anatofuz
parents:
diff changeset
946 IR and enables more concise and op-specific forms, e.g., when all regions of an
anatofuz
parents:
diff changeset
947 op have the same attribute that can be only mentioned once. Since the semantics
anatofuz
parents:
diff changeset
948 of the region is entirely defined by the enclosing entity, it also makes sense
anatofuz
parents:
diff changeset
949 to have attributes attached to that entity rather than to the region itself.
anatofuz
parents:
diff changeset
950
anatofuz
parents:
diff changeset
951 This can be reconsidered in the future if we see a non-neglectable amount of use
anatofuz
parents:
diff changeset
952 cases.
anatofuz
parents:
diff changeset
953
anatofuz
parents:
diff changeset
954 ### Read/Write/May_Read/May_Write sets for External Functions
anatofuz
parents:
diff changeset
955
anatofuz
parents:
diff changeset
956 Having read, write, may_read, and may_write sets for external functions which
anatofuz
parents:
diff changeset
957 include opaque ones, high-performance vendor libraries such as CuDNN, CuB, MKL,
anatofuz
parents:
diff changeset
958 FFT libraries, user-provided/optimized functions, or data movement runtimes such
anatofuz
parents:
diff changeset
959 as DMA ones is a powerful feature. It allows the compiler to perform analysis,
anatofuz
parents:
diff changeset
960 composition/transformation in the presence of such calls and with loops around
anatofuz
parents:
diff changeset
961 such calls on sub-tensors. For user-provided or custom hand-tuned functions, the
anatofuz
parents:
diff changeset
962 read/write/may_read/may_write sets could be provided a-priori by a user as part
anatofuz
parents:
diff changeset
963 of the external function signature or they could be part of a database.
anatofuz
parents:
diff changeset
964
anatofuz
parents:
diff changeset
965 TODO: Design this, and update to use function attribute syntax.
anatofuz
parents:
diff changeset
966
anatofuz
parents:
diff changeset
967 Example:
anatofuz
parents:
diff changeset
968
anatofuz
parents:
diff changeset
969 ```mlir
anatofuz
parents:
diff changeset
970 ##rel9 ( ) [s0] -> (r0, r1) : 0 <= r0 <= 1023, 0 <= r1 <= s0 - 1
anatofuz
parents:
diff changeset
971
anatofuz
parents:
diff changeset
972 func @cblas_reduce_ffi(%M: memref<1024 x ? x f32, #layout_map0, /*mem=*/0>)
anatofuz
parents:
diff changeset
973 -> f32 [
anatofuz
parents:
diff changeset
974 reads: {%M, ##rel9() }
anatofuz
parents:
diff changeset
975 writes: /* empty */
anatofuz
parents:
diff changeset
976 may_reads: /* empty */
anatofuz
parents:
diff changeset
977 may_writes: /* empty */
anatofuz
parents:
diff changeset
978 ]
anatofuz
parents:
diff changeset
979
anatofuz
parents:
diff changeset
980 func @dma_mem_to_scratchpad(%a : memref<1024 x f32, #layout_map0, /*mem=*/0>,
anatofuz
parents:
diff changeset
981 %b : memref<1024 x f32, #layout_map0, 1>, %c : memref<1024 x f32,
anatofuz
parents:
diff changeset
982 #layout_map0>) [
anatofuz
parents:
diff changeset
983 reads: {%M, ##rel9() }
anatofuz
parents:
diff changeset
984 writes: /* empty */
anatofuz
parents:
diff changeset
985 may_reads: /* empty */
anatofuz
parents:
diff changeset
986 may_writes: /* empty */
anatofuz
parents:
diff changeset
987 ]
anatofuz
parents:
diff changeset
988
anatofuz
parents:
diff changeset
989 ```
anatofuz
parents:
diff changeset
990
anatofuz
parents:
diff changeset
991 ### Memref Extensions
anatofuz
parents:
diff changeset
992
anatofuz
parents:
diff changeset
993 1. Arbitrary polyhedral shapes for tensors: e.g., triangular shapes in tensor
anatofuz
parents:
diff changeset
994 dimensions where there is symmetry: use integer set (affine constraints) to
anatofuz
parents:
diff changeset
995 model tensor data space (instead of just extents). Requires some changes to
anatofuz
parents:
diff changeset
996 the IR and the in-memory form.
anatofuz
parents:
diff changeset
997 1. Layout maps
anatofuz
parents:
diff changeset
998
anatofuz
parents:
diff changeset
999 1. Allow piece-wise affine maps for layouts: allows clean modeling of
anatofuz
parents:
diff changeset
1000 boundary cases for images/tensors through padding, wrapping, mirroring,
anatofuz
parents:
diff changeset
1001 padding where padded values are the results of computation as opposed to
anatofuz
parents:
diff changeset
1002 data, padding in the interior as opposed to just boundaries.
anatofuz
parents:
diff changeset
1003 1. Allow many-to-one layout maps: Index and layout maps in the current
anatofuz
parents:
diff changeset
1004 proposal are bijective. Extending them to many-to-one layout maps allows
anatofuz
parents:
diff changeset
1005 cleaner(?) modeling of broadcast/reduce style computations while reusing
anatofuz
parents:
diff changeset
1006 memory.
anatofuz
parents:
diff changeset
1007
anatofuz
parents:
diff changeset
1008 Proposal 2(a) requires non-trivial changes to the IR and the in-memory
anatofuz
parents:
diff changeset
1009 representation. 2(b) requires no change, but impacts how cost models look at
anatofuz
parents:
diff changeset
1010 index and layout maps.
anatofuz
parents:
diff changeset
1011
anatofuz
parents:
diff changeset
1012 ### `affine.if` and `affine.for` Extensions for "Escaping Scalars"
anatofuz
parents:
diff changeset
1013
anatofuz
parents:
diff changeset
1014 We considered providing a representation for SSA values that are live out of
anatofuz
parents:
diff changeset
1015 `if/else` conditional bodies and loop carried in `affine.for` loops. We
anatofuz
parents:
diff changeset
1016 ultimately abandoned this approach due to its complexity. In the current design
anatofuz
parents:
diff changeset
1017 of MLIR, scalar variables cannot escape for loops or if instructions. In
anatofuz
parents:
diff changeset
1018 situations, where escaping is necessary, we use zero-dimensional tensors and
anatofuz
parents:
diff changeset
1019 memrefs instead of scalars.
anatofuz
parents:
diff changeset
1020
anatofuz
parents:
diff changeset
1021 **TODO**: This whole section is obsolete and should be updated to use block
anatofuz
parents:
diff changeset
1022 arguments and a yield like terminator in for/if instructions.
anatofuz
parents:
diff changeset
1023
anatofuz
parents:
diff changeset
1024 The abandoned design of supporting escaping scalars is as follows:
anatofuz
parents:
diff changeset
1025
anatofuz
parents:
diff changeset
1026 #### affine.for Instruction
anatofuz
parents:
diff changeset
1027
anatofuz
parents:
diff changeset
1028 Syntax:
anatofuz
parents:
diff changeset
1029
anatofuz
parents:
diff changeset
1030 ```
anatofuz
parents:
diff changeset
1031 [<out-var-list> =]
anatofuz
parents:
diff changeset
1032 for %<index-variable-name> = <lower-bound> ... <upper-bound> step <step>
anatofuz
parents:
diff changeset
1033 [with <in-var-list>] { <loop-instruction-list> }
anatofuz
parents:
diff changeset
1034 ```
anatofuz
parents:
diff changeset
1035
anatofuz
parents:
diff changeset
1036 out-var-list is a comma separated list of SSA values defined in the loop body
anatofuz
parents:
diff changeset
1037 and used outside the loop body. in-var-list is a comma separated list of SSA
anatofuz
parents:
diff changeset
1038 values used inside the loop body and their initializers. loop-instruction-list
anatofuz
parents:
diff changeset
1039 is a list of instructions that may also include a yield instruction.
anatofuz
parents:
diff changeset
1040
anatofuz
parents:
diff changeset
1041 Example:
anatofuz
parents:
diff changeset
1042
anatofuz
parents:
diff changeset
1043 ```mlir
anatofuz
parents:
diff changeset
1044 // Return sum of elements in 1-dimensional mref A
anatofuz
parents:
diff changeset
1045 func i32 @sum(%A : memref<?xi32>, %N : i32) -> (i32) {
anatofuz
parents:
diff changeset
1046 %init = 0
anatofuz
parents:
diff changeset
1047 %result = affine.for %i = 0 to N with %tmp(%init) {
anatofuz
parents:
diff changeset
1048 %value = affine.load %A[%i]
anatofuz
parents:
diff changeset
1049 %sum = %value + %tmp
anatofuz
parents:
diff changeset
1050 yield %sum
anatofuz
parents:
diff changeset
1051 }
anatofuz
parents:
diff changeset
1052 return %result : i32
anatofuz
parents:
diff changeset
1053 }
anatofuz
parents:
diff changeset
1054 ```
anatofuz
parents:
diff changeset
1055
anatofuz
parents:
diff changeset
1056 #### affine.if/else Instruction
anatofuz
parents:
diff changeset
1057
anatofuz
parents:
diff changeset
1058 Syntax:
anatofuz
parents:
diff changeset
1059
anatofuz
parents:
diff changeset
1060 ```
anatofuz
parents:
diff changeset
1061 <out-var-list> = affine.if (<cond-list>) {...} [else {...}]
anatofuz
parents:
diff changeset
1062 ```
anatofuz
parents:
diff changeset
1063
anatofuz
parents:
diff changeset
1064 Out-var-list is a list of SSA values defined by the if-instruction. The values
anatofuz
parents:
diff changeset
1065 are arguments to the yield-instruction that occurs in both then and else clauses
anatofuz
parents:
diff changeset
1066 when else clause is present. When if instruction contains only if clause, the
anatofuz
parents:
diff changeset
1067 escaping value defined in the then clause should be merged with the value the
anatofuz
parents:
diff changeset
1068 variable had before the if instruction. The design captured here does not handle
anatofuz
parents:
diff changeset
1069 this situation.
anatofuz
parents:
diff changeset
1070
anatofuz
parents:
diff changeset
1071 Example:
anatofuz
parents:
diff changeset
1072
anatofuz
parents:
diff changeset
1073 ```mlir
anatofuz
parents:
diff changeset
1074 // Compute sum of half of the array
anatofuz
parents:
diff changeset
1075 func i32 @sum_half(%A : memref<?xi32>, %N : i32) -> (i32) {
anatofuz
parents:
diff changeset
1076 %s0 = 0
anatofuz
parents:
diff changeset
1077 %s1 = affine.for %i = 1 ... N step 1 with %s2 (%s0) {
anatofuz
parents:
diff changeset
1078 %s3 = if (%i >= %N / 2) {
anatofuz
parents:
diff changeset
1079 %v0 = affine.load %A[%i]
anatofuz
parents:
diff changeset
1080 %s4 = %s2 + %v0
anatofuz
parents:
diff changeset
1081 yield %s4
anatofuz
parents:
diff changeset
1082 }
anatofuz
parents:
diff changeset
1083 yield %s3
anatofuz
parents:
diff changeset
1084 }
anatofuz
parents:
diff changeset
1085 return %s1 : i32
anatofuz
parents:
diff changeset
1086 }
anatofuz
parents:
diff changeset
1087 ```
anatofuz
parents:
diff changeset
1088
anatofuz
parents:
diff changeset
1089 ### Multithreading the compiler
anatofuz
parents:
diff changeset
1090
anatofuz
parents:
diff changeset
1091 People want compilers to go fast, and one simple way to do that is to
anatofuz
parents:
diff changeset
1092 multi-thread them. There are multiple strategies for this, but a simple one is
anatofuz
parents:
diff changeset
1093 to optimize and compile separate functions in parallel. LLVM's original pass
anatofuz
parents:
diff changeset
1094 manager anticipated this demand, and the CallGraphSCCPass manager is even
anatofuz
parents:
diff changeset
1095 designed to support this as well, but unfortunately, a few early design
anatofuz
parents:
diff changeset
1096 decisions in LLVM prevent this from ever happening. Instead, things like ThinLTO
anatofuz
parents:
diff changeset
1097 are forced to split programs into separate LLVM modules/context and optimize
anatofuz
parents:
diff changeset
1098 those chunks independently.
anatofuz
parents:
diff changeset
1099
anatofuz
parents:
diff changeset
1100 The problem is that LLVM has several objects in its IR that are globally uniqued
anatofuz
parents:
diff changeset
1101 and also mutable: notably constants like `i32 0`. In LLVM, these constants are
anatofuz
parents:
diff changeset
1102 `Value`'s, which allow them to be used as operands to instructions, and that
anatofuz
parents:
diff changeset
1103 they also have SSA use lists. Because these things are uniqued, every `i32 0` in
anatofuz
parents:
diff changeset
1104 any function shares a use list. This means that optimizing multiple functions in
anatofuz
parents:
diff changeset
1105 parallel won't work (at least without some sort of synchronization on the use
anatofuz
parents:
diff changeset
1106 lists, which would be unbearably inefficient).
anatofuz
parents:
diff changeset
1107
anatofuz
parents:
diff changeset
1108 MLIR now supports a multithreaded pass manager. We do this through several
anatofuz
parents:
diff changeset
1109 design choices:
anatofuz
parents:
diff changeset
1110
anatofuz
parents:
diff changeset
1111 1. MLIR makes use of extensive uniqued immutable data structures (affine
anatofuz
parents:
diff changeset
1112 expressions, types, etc are all immutable, uniqued, and immortal).
anatofuz
parents:
diff changeset
1113 2. Constants are defined in per-function pools, instead of being globally
anatofuz
parents:
diff changeset
1114 uniqued.
anatofuz
parents:
diff changeset
1115 3. Functions themselves are not SSA values either, so they don't have the same
anatofuz
parents:
diff changeset
1116 problem as constants.
anatofuz
parents:
diff changeset
1117 4. FunctionPasses are copied (through their copy ctor) into one instance per
anatofuz
parents:
diff changeset
1118 thread, avoiding sharing of local state across threads.
anatofuz
parents:
diff changeset
1119
anatofuz
parents:
diff changeset
1120 This allows MLIR function passes to support efficient multithreaded compilation
anatofuz
parents:
diff changeset
1121 and code generation.