Mercurial > hg > CbC > CbC_llvm
diff flang/docs/ArrayComposition.md @ 221:79ff65ed7e25
LLVM12 Original
author | Shinji KONO <kono@ie.u-ryukyu.ac.jp> |
---|---|
date | Tue, 15 Jun 2021 19:15:29 +0900 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/flang/docs/ArrayComposition.md Tue Jun 15 19:15:29 2021 +0900 @@ -0,0 +1,216 @@ +<!--===- docs/ArrayComposition.md + + Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. + See https://llvm.org/LICENSE.txt for license information. + SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception + +--> + +# Array Composition + +```eval_rst +.. contents:: + :local: +``` + +This note attempts to describe the motivation for and design of an +implementation of Fortran 90 (and later) array expression evaluation that +minimizes the use of dynamically allocated temporary storage for +the results of calls to transformational intrinsic functions, and +making them more amenable to acceleration. + +The transformational intrinsic functions of Fortran of interest to +us here include: + +* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`, + `DOT_PRODUCT`, + `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`) +* Axial reductions (`SUM(X,DIM=)`, &c.) +* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`) +* Axial location reductions (`MAXLOC(DIM=`, &c.) +* `TRANSPOSE(M)` matrix transposition +* `RESHAPE` without `ORDER=` +* `RESHAPE` with `ORDER=` +* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=` +* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=` +* `PACK` and `UNPACK` +* `MATMUL` +* `SPREAD` + +Other Fortran intrinsic functions are technically transformational (e.g., +`COMMAND_ARGUMENT_COUNT`) but not of interest for this note. +The generic `REDUCE` is also not considered here. + +## Arrays as functions + +A whole array can be viewed as a function that maps its indices to the values +of its elements. +Specifically, it is a map from a tuple of integers to its element type. +The rank of the array is the number of elements in that tuple, +and the shape of the array delimits the domain of the map. + +`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers +`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values. + +## Array expressions as functions + +The same perspective can be taken of an array expression comprising +intrinsic operators and elemental functions. +Fortran doesn't allow one to apply subscripts directly to an expression, +but expressions have rank and shape, and one can view array expressions +as functions over index tuples by applying those indices to the arrays +and subexpressions in the expression. + +Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`). +The right-hand side of that assignment could be evaluated into a +temporary array `T` and then subscripted as it is copied into `B`. +``` +REAL, ALLOCATABLE :: T(:,:) +ALLOCATE(T(N,M)) +DO CONCURRENT(J=1:N,K=1:M) + T(J,K)=A(J,K) + 1.0 +END DO +DO CONCURRENT(J=1:N,K=1:M) + B(J,K)=T(J,K) +END DO +DEALLOCATE(T) +``` +But we can avoid the allocation, population, and deallocation of +the temporary by treating the right-hand side expression as if it +were a statement function `F(J,K)=A(J,K)+1.0` and evaluating +``` +DO CONCURRENT(J=1:N,K=1:M) + A(J,K)=F(J,K) +END DO +``` + +In general, when a Fortran array assignment to a non-allocatable array +does not include the left-hand +side variable as an operand of the right-hand side expression, and any +function calls on the right-hand side are elemental or scalar-valued, +we can avoid the use of a temporary. + +## Transformational intrinsic functions as function composition + +Many of the transformational intrinsic functions listed above +can, when their array arguments are viewed as functions over their +index tuples, be seen as compositions of those functions with +functions of the "incoming" indices -- yielding a function for +an entire right-hand side of an array assignment statement. + +For example, the application of `TRANSPOSE(A + 1.0)` to the index +tuple `(J,K)` becomes `A(K,J) + 1.0`. + +Partial (axial) reductions can be similarly composed. +The application of `SUM(A,DIM=2)` to the index `J` is the +complete reduction `SUM(A(J,:))`. + +More completely: +* Reductions to scalars (`SUM(X)` without `DIM=`) become + runtime calls; the result needs no dynamic allocation, + being a scalar. +* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)` + become scalar values like `SUM(X(J,K,:))` if `d=3`. +* Location reductions to indices (`MAXLOC(X)` without `DIM=`) + do not require dynamic allocation, since their results are + either scalar or small vectors of length `RANK(X)`. +* Axial location reductions (`MAXLOC(X,DIM=)`, &c.) + are handled like other axial reductions like `SUM(DIM=)`. +* `TRANSPOSE(M)` exchanges the two components of the index tuple. +* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape + vector `S`, and then use it to linearize indices into offsets + in the storage order of `A` (whose shape must also be captured). + These conversions can involve division and/or modulus, which + can be optimized into a fixed-point multiplication using the + usual technique. +* `RESHAPE` with `ORDER=` is similar, but must permute the + components of the index tuple; it generalizes `TRANSPOSE`. +* `CSHIFT` applies addition and modulus. +* `EOSHIFT` applies addition and a conditional move (`MERGE`). +* `PACK` and `UNPACK` are likely to require a runtime call. +* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but + might benefit from calling a highly optimized runtime + routine. +* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply + applies `A` to a reduced index tuple. + +## Determination of rank and shape + +An important part of evaluating array expressions without the use of +temporary storage is determining the shape of the result prior to, +or without, evaluating the elements of the result. + +The shapes of array objects, results of elemental intrinsic functions, +and results of intrinsic operations are obvious. +But it is possible to determine the shapes of the results of many +transformational intrinsic function calls as well. + +* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed: + `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general. + (The `DIM=` argument is commonly a compile-time constant.) +* `SHAPE(MAXLOC(X))` is `[RANK(X)]`. +* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed. +* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`. +* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`. +* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`. +* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)` +* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`. +* `RANK(PACK(...))` is always 1. +* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`. +* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`. +* `SHAPE(SHAPE(X))` is `[RANK(X)]`. +* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at + dimension `d`. + +This is useful because expression evaluations that *do* require temporaries +to hold their results (due to the context in which the evaluation occurs) +can be implemented with a separation of the allocation +of the temporary array and the population of the array. +The code that evaluates the expression, or that implements a transformational +intrinsic in the runtime library, can be designed with an API that includes +a pointer to the destination array as an argument. + +Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable +of evaluating their array expressions directly into the newly-allocated +storage for the allocatable array. +The implementation would generate code to calculate the shape, use it +to allocate the memory and populate the descriptor, and then drive a +loop nest around the expression to populate the array. +In cases where the analyzed shape is known at compile time, we should +be able to have the opportunity to avoid heap allocation in favor of +stack storage, if the scope of the variable is local. + +## Automatic reallocation of allocatables + +Fortran 2003 introduced the ability to assign non-conforming array expressions +to ALLOCATABLE arrays with the implied semantics of reallocation to the +new shape. +The implementation of this feature also becomes more straightforward if +our implementation of array expressions has decoupled calculation of shapes +from the evaluation of the elements of the result. + +## Rewriting rules + +Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into +the result of an array expression or subexpression. + +* Array constructors always yield vectors; higher-rank arrays that appear as + constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`. +* Array constructors with multiple constituents are concatenations of + their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`. +* Array constructors with implied DO loops are difficult when nested + triangularly. +* Whole array references can have lower bounds other than 1, so + `A => A(LBOUND(A,1):UBOUND(A,1),...)`. +* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`. +* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`. +* Scalar operands ignore indices: `X{j,k} => X`. + Further, they are evaluated at most once. +* Elemental operators and functions apply indices to their arguments: + `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`. +* `TRANSPOSE(X){j,k} => X{k,j}`. +* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated. + If X is sufficiently expensive to compute elementally, it might be evaluated + into a temporary. + +(more...)