221
|
1 <!--===- docs/ArrayComposition.md
|
|
2
|
|
3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
4 See https://llvm.org/LICENSE.txt for license information.
|
|
5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
6
|
|
7 -->
|
|
8
|
|
9 # Array Composition
|
|
10
|
|
11 ```eval_rst
|
|
12 .. contents::
|
|
13 :local:
|
|
14 ```
|
|
15
|
|
16 This note attempts to describe the motivation for and design of an
|
|
17 implementation of Fortran 90 (and later) array expression evaluation that
|
|
18 minimizes the use of dynamically allocated temporary storage for
|
|
19 the results of calls to transformational intrinsic functions, and
|
|
20 making them more amenable to acceleration.
|
|
21
|
|
22 The transformational intrinsic functions of Fortran of interest to
|
|
23 us here include:
|
|
24
|
|
25 * Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`,
|
|
26 `DOT_PRODUCT`,
|
|
27 `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`)
|
|
28 * Axial reductions (`SUM(X,DIM=)`, &c.)
|
|
29 * Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`)
|
|
30 * Axial location reductions (`MAXLOC(DIM=`, &c.)
|
|
31 * `TRANSPOSE(M)` matrix transposition
|
|
32 * `RESHAPE` without `ORDER=`
|
|
33 * `RESHAPE` with `ORDER=`
|
|
34 * `CSHIFT` and `EOSHIFT` with scalar `SHIFT=`
|
|
35 * `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=`
|
|
36 * `PACK` and `UNPACK`
|
|
37 * `MATMUL`
|
|
38 * `SPREAD`
|
|
39
|
|
40 Other Fortran intrinsic functions are technically transformational (e.g.,
|
|
41 `COMMAND_ARGUMENT_COUNT`) but not of interest for this note.
|
|
42 The generic `REDUCE` is also not considered here.
|
|
43
|
|
44 ## Arrays as functions
|
|
45
|
|
46 A whole array can be viewed as a function that maps its indices to the values
|
|
47 of its elements.
|
|
48 Specifically, it is a map from a tuple of integers to its element type.
|
|
49 The rank of the array is the number of elements in that tuple,
|
|
50 and the shape of the array delimits the domain of the map.
|
|
51
|
|
52 `REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers
|
|
53 `(J,K)` with `1<=J<=N` and `1<=J<=M` to real values.
|
|
54
|
|
55 ## Array expressions as functions
|
|
56
|
|
57 The same perspective can be taken of an array expression comprising
|
|
58 intrinsic operators and elemental functions.
|
|
59 Fortran doesn't allow one to apply subscripts directly to an expression,
|
|
60 but expressions have rank and shape, and one can view array expressions
|
|
61 as functions over index tuples by applying those indices to the arrays
|
|
62 and subexpressions in the expression.
|
|
63
|
|
64 Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`).
|
|
65 The right-hand side of that assignment could be evaluated into a
|
|
66 temporary array `T` and then subscripted as it is copied into `B`.
|
|
67 ```
|
|
68 REAL, ALLOCATABLE :: T(:,:)
|
|
69 ALLOCATE(T(N,M))
|
|
70 DO CONCURRENT(J=1:N,K=1:M)
|
|
71 T(J,K)=A(J,K) + 1.0
|
|
72 END DO
|
|
73 DO CONCURRENT(J=1:N,K=1:M)
|
|
74 B(J,K)=T(J,K)
|
|
75 END DO
|
|
76 DEALLOCATE(T)
|
|
77 ```
|
|
78 But we can avoid the allocation, population, and deallocation of
|
|
79 the temporary by treating the right-hand side expression as if it
|
|
80 were a statement function `F(J,K)=A(J,K)+1.0` and evaluating
|
|
81 ```
|
|
82 DO CONCURRENT(J=1:N,K=1:M)
|
|
83 A(J,K)=F(J,K)
|
|
84 END DO
|
|
85 ```
|
|
86
|
|
87 In general, when a Fortran array assignment to a non-allocatable array
|
|
88 does not include the left-hand
|
|
89 side variable as an operand of the right-hand side expression, and any
|
|
90 function calls on the right-hand side are elemental or scalar-valued,
|
|
91 we can avoid the use of a temporary.
|
|
92
|
|
93 ## Transformational intrinsic functions as function composition
|
|
94
|
|
95 Many of the transformational intrinsic functions listed above
|
|
96 can, when their array arguments are viewed as functions over their
|
|
97 index tuples, be seen as compositions of those functions with
|
|
98 functions of the "incoming" indices -- yielding a function for
|
|
99 an entire right-hand side of an array assignment statement.
|
|
100
|
|
101 For example, the application of `TRANSPOSE(A + 1.0)` to the index
|
|
102 tuple `(J,K)` becomes `A(K,J) + 1.0`.
|
|
103
|
|
104 Partial (axial) reductions can be similarly composed.
|
|
105 The application of `SUM(A,DIM=2)` to the index `J` is the
|
|
106 complete reduction `SUM(A(J,:))`.
|
|
107
|
|
108 More completely:
|
|
109 * Reductions to scalars (`SUM(X)` without `DIM=`) become
|
|
110 runtime calls; the result needs no dynamic allocation,
|
|
111 being a scalar.
|
|
112 * Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)`
|
|
113 become scalar values like `SUM(X(J,K,:))` if `d=3`.
|
|
114 * Location reductions to indices (`MAXLOC(X)` without `DIM=`)
|
|
115 do not require dynamic allocation, since their results are
|
|
116 either scalar or small vectors of length `RANK(X)`.
|
|
117 * Axial location reductions (`MAXLOC(X,DIM=)`, &c.)
|
|
118 are handled like other axial reductions like `SUM(DIM=)`.
|
|
119 * `TRANSPOSE(M)` exchanges the two components of the index tuple.
|
|
120 * `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape
|
|
121 vector `S`, and then use it to linearize indices into offsets
|
|
122 in the storage order of `A` (whose shape must also be captured).
|
|
123 These conversions can involve division and/or modulus, which
|
|
124 can be optimized into a fixed-point multiplication using the
|
|
125 usual technique.
|
|
126 * `RESHAPE` with `ORDER=` is similar, but must permute the
|
|
127 components of the index tuple; it generalizes `TRANSPOSE`.
|
|
128 * `CSHIFT` applies addition and modulus.
|
|
129 * `EOSHIFT` applies addition and a conditional move (`MERGE`).
|
|
130 * `PACK` and `UNPACK` are likely to require a runtime call.
|
|
131 * `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but
|
|
132 might benefit from calling a highly optimized runtime
|
|
133 routine.
|
|
134 * `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply
|
|
135 applies `A` to a reduced index tuple.
|
|
136
|
|
137 ## Determination of rank and shape
|
|
138
|
|
139 An important part of evaluating array expressions without the use of
|
|
140 temporary storage is determining the shape of the result prior to,
|
|
141 or without, evaluating the elements of the result.
|
|
142
|
|
143 The shapes of array objects, results of elemental intrinsic functions,
|
|
144 and results of intrinsic operations are obvious.
|
|
145 But it is possible to determine the shapes of the results of many
|
|
146 transformational intrinsic function calls as well.
|
|
147
|
|
148 * `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed:
|
|
149 `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general.
|
|
150 (The `DIM=` argument is commonly a compile-time constant.)
|
|
151 * `SHAPE(MAXLOC(X))` is `[RANK(X)]`.
|
|
152 * `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed.
|
|
153 * `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`.
|
|
154 * `SHAPE(RESHAPE(..., SHAPE=S))` is `S`.
|
|
155 * `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`.
|
|
156 * `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)`
|
|
157 * `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`.
|
|
158 * `RANK(PACK(...))` is always 1.
|
|
159 * `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`.
|
|
160 * `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`.
|
|
161 * `SHAPE(SHAPE(X))` is `[RANK(X)]`.
|
|
162 * `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at
|
|
163 dimension `d`.
|
|
164
|
|
165 This is useful because expression evaluations that *do* require temporaries
|
|
166 to hold their results (due to the context in which the evaluation occurs)
|
|
167 can be implemented with a separation of the allocation
|
|
168 of the temporary array and the population of the array.
|
|
169 The code that evaluates the expression, or that implements a transformational
|
|
170 intrinsic in the runtime library, can be designed with an API that includes
|
|
171 a pointer to the destination array as an argument.
|
|
172
|
|
173 Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable
|
|
174 of evaluating their array expressions directly into the newly-allocated
|
|
175 storage for the allocatable array.
|
|
176 The implementation would generate code to calculate the shape, use it
|
|
177 to allocate the memory and populate the descriptor, and then drive a
|
|
178 loop nest around the expression to populate the array.
|
|
179 In cases where the analyzed shape is known at compile time, we should
|
|
180 be able to have the opportunity to avoid heap allocation in favor of
|
|
181 stack storage, if the scope of the variable is local.
|
|
182
|
|
183 ## Automatic reallocation of allocatables
|
|
184
|
|
185 Fortran 2003 introduced the ability to assign non-conforming array expressions
|
|
186 to ALLOCATABLE arrays with the implied semantics of reallocation to the
|
|
187 new shape.
|
|
188 The implementation of this feature also becomes more straightforward if
|
|
189 our implementation of array expressions has decoupled calculation of shapes
|
|
190 from the evaluation of the elements of the result.
|
|
191
|
|
192 ## Rewriting rules
|
|
193
|
|
194 Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into
|
|
195 the result of an array expression or subexpression.
|
|
196
|
|
197 * Array constructors always yield vectors; higher-rank arrays that appear as
|
|
198 constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`.
|
|
199 * Array constructors with multiple constituents are concatenations of
|
|
200 their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`.
|
|
201 * Array constructors with implied DO loops are difficult when nested
|
|
202 triangularly.
|
|
203 * Whole array references can have lower bounds other than 1, so
|
|
204 `A => A(LBOUND(A,1):UBOUND(A,1),...)`.
|
|
205 * Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`.
|
|
206 * Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`.
|
|
207 * Scalar operands ignore indices: `X{j,k} => X`.
|
|
208 Further, they are evaluated at most once.
|
|
209 * Elemental operators and functions apply indices to their arguments:
|
|
210 `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`.
|
|
211 * `TRANSPOSE(X){j,k} => X{k,j}`.
|
|
212 * `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated.
|
|
213 If X is sufficiently expensive to compute elementally, it might be evaluated
|
|
214 into a temporary.
|
|
215
|
|
216 (more...)
|