Mercurial > hg > CbC > CbC_llvm
comparison flang/docs/ArrayComposition.md @ 221:79ff65ed7e25
LLVM12 Original
author | Shinji KONO <kono@ie.u-ryukyu.ac.jp> |
---|---|
date | Tue, 15 Jun 2021 19:15:29 +0900 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
220:42394fc6a535 | 221:79ff65ed7e25 |
---|---|
1 <!--===- docs/ArrayComposition.md | |
2 | |
3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | |
4 See https://llvm.org/LICENSE.txt for license information. | |
5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | |
6 | |
7 --> | |
8 | |
9 # Array Composition | |
10 | |
11 ```eval_rst | |
12 .. contents:: | |
13 :local: | |
14 ``` | |
15 | |
16 This note attempts to describe the motivation for and design of an | |
17 implementation of Fortran 90 (and later) array expression evaluation that | |
18 minimizes the use of dynamically allocated temporary storage for | |
19 the results of calls to transformational intrinsic functions, and | |
20 making them more amenable to acceleration. | |
21 | |
22 The transformational intrinsic functions of Fortran of interest to | |
23 us here include: | |
24 | |
25 * Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`, | |
26 `DOT_PRODUCT`, | |
27 `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`) | |
28 * Axial reductions (`SUM(X,DIM=)`, &c.) | |
29 * Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`) | |
30 * Axial location reductions (`MAXLOC(DIM=`, &c.) | |
31 * `TRANSPOSE(M)` matrix transposition | |
32 * `RESHAPE` without `ORDER=` | |
33 * `RESHAPE` with `ORDER=` | |
34 * `CSHIFT` and `EOSHIFT` with scalar `SHIFT=` | |
35 * `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=` | |
36 * `PACK` and `UNPACK` | |
37 * `MATMUL` | |
38 * `SPREAD` | |
39 | |
40 Other Fortran intrinsic functions are technically transformational (e.g., | |
41 `COMMAND_ARGUMENT_COUNT`) but not of interest for this note. | |
42 The generic `REDUCE` is also not considered here. | |
43 | |
44 ## Arrays as functions | |
45 | |
46 A whole array can be viewed as a function that maps its indices to the values | |
47 of its elements. | |
48 Specifically, it is a map from a tuple of integers to its element type. | |
49 The rank of the array is the number of elements in that tuple, | |
50 and the shape of the array delimits the domain of the map. | |
51 | |
52 `REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers | |
53 `(J,K)` with `1<=J<=N` and `1<=J<=M` to real values. | |
54 | |
55 ## Array expressions as functions | |
56 | |
57 The same perspective can be taken of an array expression comprising | |
58 intrinsic operators and elemental functions. | |
59 Fortran doesn't allow one to apply subscripts directly to an expression, | |
60 but expressions have rank and shape, and one can view array expressions | |
61 as functions over index tuples by applying those indices to the arrays | |
62 and subexpressions in the expression. | |
63 | |
64 Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`). | |
65 The right-hand side of that assignment could be evaluated into a | |
66 temporary array `T` and then subscripted as it is copied into `B`. | |
67 ``` | |
68 REAL, ALLOCATABLE :: T(:,:) | |
69 ALLOCATE(T(N,M)) | |
70 DO CONCURRENT(J=1:N,K=1:M) | |
71 T(J,K)=A(J,K) + 1.0 | |
72 END DO | |
73 DO CONCURRENT(J=1:N,K=1:M) | |
74 B(J,K)=T(J,K) | |
75 END DO | |
76 DEALLOCATE(T) | |
77 ``` | |
78 But we can avoid the allocation, population, and deallocation of | |
79 the temporary by treating the right-hand side expression as if it | |
80 were a statement function `F(J,K)=A(J,K)+1.0` and evaluating | |
81 ``` | |
82 DO CONCURRENT(J=1:N,K=1:M) | |
83 A(J,K)=F(J,K) | |
84 END DO | |
85 ``` | |
86 | |
87 In general, when a Fortran array assignment to a non-allocatable array | |
88 does not include the left-hand | |
89 side variable as an operand of the right-hand side expression, and any | |
90 function calls on the right-hand side are elemental or scalar-valued, | |
91 we can avoid the use of a temporary. | |
92 | |
93 ## Transformational intrinsic functions as function composition | |
94 | |
95 Many of the transformational intrinsic functions listed above | |
96 can, when their array arguments are viewed as functions over their | |
97 index tuples, be seen as compositions of those functions with | |
98 functions of the "incoming" indices -- yielding a function for | |
99 an entire right-hand side of an array assignment statement. | |
100 | |
101 For example, the application of `TRANSPOSE(A + 1.0)` to the index | |
102 tuple `(J,K)` becomes `A(K,J) + 1.0`. | |
103 | |
104 Partial (axial) reductions can be similarly composed. | |
105 The application of `SUM(A,DIM=2)` to the index `J` is the | |
106 complete reduction `SUM(A(J,:))`. | |
107 | |
108 More completely: | |
109 * Reductions to scalars (`SUM(X)` without `DIM=`) become | |
110 runtime calls; the result needs no dynamic allocation, | |
111 being a scalar. | |
112 * Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)` | |
113 become scalar values like `SUM(X(J,K,:))` if `d=3`. | |
114 * Location reductions to indices (`MAXLOC(X)` without `DIM=`) | |
115 do not require dynamic allocation, since their results are | |
116 either scalar or small vectors of length `RANK(X)`. | |
117 * Axial location reductions (`MAXLOC(X,DIM=)`, &c.) | |
118 are handled like other axial reductions like `SUM(DIM=)`. | |
119 * `TRANSPOSE(M)` exchanges the two components of the index tuple. | |
120 * `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape | |
121 vector `S`, and then use it to linearize indices into offsets | |
122 in the storage order of `A` (whose shape must also be captured). | |
123 These conversions can involve division and/or modulus, which | |
124 can be optimized into a fixed-point multiplication using the | |
125 usual technique. | |
126 * `RESHAPE` with `ORDER=` is similar, but must permute the | |
127 components of the index tuple; it generalizes `TRANSPOSE`. | |
128 * `CSHIFT` applies addition and modulus. | |
129 * `EOSHIFT` applies addition and a conditional move (`MERGE`). | |
130 * `PACK` and `UNPACK` are likely to require a runtime call. | |
131 * `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but | |
132 might benefit from calling a highly optimized runtime | |
133 routine. | |
134 * `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply | |
135 applies `A` to a reduced index tuple. | |
136 | |
137 ## Determination of rank and shape | |
138 | |
139 An important part of evaluating array expressions without the use of | |
140 temporary storage is determining the shape of the result prior to, | |
141 or without, evaluating the elements of the result. | |
142 | |
143 The shapes of array objects, results of elemental intrinsic functions, | |
144 and results of intrinsic operations are obvious. | |
145 But it is possible to determine the shapes of the results of many | |
146 transformational intrinsic function calls as well. | |
147 | |
148 * `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed: | |
149 `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general. | |
150 (The `DIM=` argument is commonly a compile-time constant.) | |
151 * `SHAPE(MAXLOC(X))` is `[RANK(X)]`. | |
152 * `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed. | |
153 * `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`. | |
154 * `SHAPE(RESHAPE(..., SHAPE=S))` is `S`. | |
155 * `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`. | |
156 * `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)` | |
157 * `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`. | |
158 * `RANK(PACK(...))` is always 1. | |
159 * `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`. | |
160 * `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`. | |
161 * `SHAPE(SHAPE(X))` is `[RANK(X)]`. | |
162 * `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at | |
163 dimension `d`. | |
164 | |
165 This is useful because expression evaluations that *do* require temporaries | |
166 to hold their results (due to the context in which the evaluation occurs) | |
167 can be implemented with a separation of the allocation | |
168 of the temporary array and the population of the array. | |
169 The code that evaluates the expression, or that implements a transformational | |
170 intrinsic in the runtime library, can be designed with an API that includes | |
171 a pointer to the destination array as an argument. | |
172 | |
173 Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable | |
174 of evaluating their array expressions directly into the newly-allocated | |
175 storage for the allocatable array. | |
176 The implementation would generate code to calculate the shape, use it | |
177 to allocate the memory and populate the descriptor, and then drive a | |
178 loop nest around the expression to populate the array. | |
179 In cases where the analyzed shape is known at compile time, we should | |
180 be able to have the opportunity to avoid heap allocation in favor of | |
181 stack storage, if the scope of the variable is local. | |
182 | |
183 ## Automatic reallocation of allocatables | |
184 | |
185 Fortran 2003 introduced the ability to assign non-conforming array expressions | |
186 to ALLOCATABLE arrays with the implied semantics of reallocation to the | |
187 new shape. | |
188 The implementation of this feature also becomes more straightforward if | |
189 our implementation of array expressions has decoupled calculation of shapes | |
190 from the evaluation of the elements of the result. | |
191 | |
192 ## Rewriting rules | |
193 | |
194 Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into | |
195 the result of an array expression or subexpression. | |
196 | |
197 * Array constructors always yield vectors; higher-rank arrays that appear as | |
198 constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`. | |
199 * Array constructors with multiple constituents are concatenations of | |
200 their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`. | |
201 * Array constructors with implied DO loops are difficult when nested | |
202 triangularly. | |
203 * Whole array references can have lower bounds other than 1, so | |
204 `A => A(LBOUND(A,1):UBOUND(A,1),...)`. | |
205 * Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`. | |
206 * Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`. | |
207 * Scalar operands ignore indices: `X{j,k} => X`. | |
208 Further, they are evaluated at most once. | |
209 * Elemental operators and functions apply indices to their arguments: | |
210 `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`. | |
211 * `TRANSPOSE(X){j,k} => X{k,j}`. | |
212 * `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated. | |
213 If X is sufficiently expensive to compute elementally, it might be evaluated | |
214 into a temporary. | |
215 | |
216 (more...) |