173
|
1 <!--===- documentation/C++17.md
|
|
2
|
|
3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
4 See https://llvm.org/LICENSE.txt for license information.
|
|
5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
|
6
|
|
7 -->
|
|
8
|
|
9 ## C++14/17 features used in f18
|
|
10
|
|
11 The C++ dialect used in this project constitutes a subset of the
|
|
12 standard C++ programming language and library features.
|
|
13 We want our dialect to be compatible with the LLVM C++ language
|
|
14 subset that will be in use at the time that we integrate with that
|
|
15 project.
|
|
16 We also want to maximize portability, future-proofing,
|
|
17 compile-time error checking, and use of best practices.
|
|
18
|
|
19 To that end, we have a C++ style guide (q.v.) that lays
|
|
20 out the details of how our C++ code should look and gives
|
|
21 guidance about feature usage.
|
|
22
|
|
23 We have chosen to use some features of the recent C++17
|
|
24 language standard in f18.
|
|
25 The most important of these are:
|
|
26 * sum types (discriminated unions) in the form of `std::variant`
|
|
27 * `using` template parameter packs
|
|
28 * generic lambdas with `auto` argument types
|
|
29 * product types in the form of `std::tuple`
|
|
30 * `std::optional`
|
|
31
|
|
32 (`std::tuple` is actually a C++11 feature, but I include it
|
|
33 in this list because it's not particularly well known.)
|
|
34
|
|
35 ### Sum types
|
|
36
|
|
37 First, some background information to explain the need for sum types
|
|
38 in f18.
|
|
39
|
|
40 Fortran is notoriously problematic to lex and parse, as tokenization
|
|
41 depends on the state of the partial parse;
|
|
42 the language has no reserved words in the sense that C++ does.
|
|
43 Fortran parsers implemented with distinct lexing and parsing phases
|
|
44 (generated by hand or with tools) need to implement them as
|
|
45 coroutines with complicated state, and experience has shown that
|
|
46 it's hard to get them right and harder to extend them as the language
|
|
47 evolves.
|
|
48
|
|
49 Alternatively, with the use of backtracking, one can parse Fortran with
|
|
50 a unified lexer/parser.
|
|
51 We have chosen to do so because it is simpler and should reduce
|
|
52 both initial bugs and long-term maintenance.
|
|
53
|
|
54 Specifically, f18's parser uses the technique of recursive descent with
|
|
55 backtracking.
|
|
56 It is constructed as the incremental composition of pure parsing functions
|
|
57 that each, when given a context (location in the input stream plus some state),
|
|
58 either _succeeds_ or _fails_ to recognize some piece of Fortran.
|
|
59 On success, they return a new state and some semantic value, and this is
|
|
60 usually an instance of a C++ `struct` type that encodes the semantic
|
|
61 content of a production in the Fortran grammar.
|
|
62
|
|
63 This technique allows us to specify both the Fortran grammar and the
|
|
64 representation of successfully parsed programs with C++ code
|
|
65 whose functions and data structures correspond closely to the productions
|
|
66 of Fortran.
|
|
67
|
|
68 The specification of Fortran uses a form of BNF with alternatives,
|
|
69 optional elements, sequences, and lists. Each of these constructs
|
|
70 in the Fortran grammar maps directly in the f18 parser to both
|
|
71 the means of combining other parsers as alternatives, &c., and to
|
|
72 the declarations of the parse tree data structures that represent
|
|
73 the results of successful parses.
|
|
74 Move semantics are used in the parsing functions to acquire and
|
|
75 combine the results of sub-parses into the result of a larger
|
|
76 parse.
|
|
77
|
|
78 To represent nodes in the Fortran parse tree, we need a means of
|
|
79 handling sum types for productions that have multiple alternatives.
|
|
80 The bounded polymorphism supplied by the C++17 `std::variant` fits
|
|
81 those needs exactly.
|
|
82 For example, production R502 in Fortran defines the top-level
|
|
83 program unit of Fortran as being a function, subroutine, module, &c.
|
|
84 The `struct ProgramUnit` in the f18 parse tree header file
|
|
85 represents each program unit with a member that is a `std::variant`
|
|
86 over the six possibilities.
|
|
87 Similarly, the parser for that type in the f18 grammar has six alternatives,
|
|
88 each of which constructs an instance of `ProgramUnit` upon the result of
|
|
89 parsing a `Module`, `FunctionSubprogram`, and so on.
|
|
90
|
|
91 Code that performs semantic analysis on the result of a successful
|
|
92 parse is typically implemented with overloaded functions.
|
|
93 A function instantiated on `ProgramUnit` will use `std::visit` to
|
|
94 identify the right alternative and perform the right actions.
|
|
95 The call to `std::visit` must pass a visitor that can handle all
|
|
96 of the possibilities, and f18 will fail to build if one is missing.
|
|
97
|
|
98 Were we unable to use `std::variant` directly, we would likely
|
|
99 have chosen to implement a local `SumType` replacement; in the
|
|
100 absence of C++17's abilities of `using` a template parameter pack
|
|
101 and allowing `auto` arguments in anonymous lambda functions,
|
|
102 it would be less convenient to use.
|
|
103
|
|
104 The other options for polymorphism in C++ at the level of C++11
|
|
105 would be to:
|
|
106 * loosen up compile-time type safety and use a unified parse tree node
|
|
107 representation with an enumeration type for an operator and generic
|
|
108 subtree pointers, or
|
|
109 * define the sum types for the parse tree as abstract base classes from
|
|
110 which each particular alternative would derive, and then use virtual
|
|
111 functions (or the forbidden `dynamic_cast`) to identify alternatives
|
|
112 during analysis
|
|
113
|
|
114 ### Product types
|
|
115
|
|
116 Many productions in the Fortran grammar describe a sequence of various
|
|
117 sub-parses.
|
|
118 For example, R504 defines the things that may appear in the "specification
|
|
119 part" of a subprogram in the order in which they are allowed: `USE`
|
|
120 statements, then `IMPORT` statements, and so on.
|
|
121
|
|
122 The parse tree node that represents such a thing needs to incorporate
|
|
123 the representations of those parses, of course.
|
|
124 It turns out to be convenient to allow these data members to be anonymous
|
|
125 components of a `std::tuple` product type.
|
|
126 This type facilitates the automation of code that walks over all of the
|
|
127 members in a type-safe fashion and avoids the need to invent and remember
|
|
128 needless member names -- the components of a `std::tuple` instance can
|
|
129 be identified and accessed in terms of their types, and those tend to be
|
|
130 distinct.
|
|
131
|
|
132 So we use `std::tuple` for such things.
|
|
133 It has also been handy for template metaprogramming that needs to work
|
|
134 with lists of types.
|
|
135
|
|
136 ### `std::optional`
|
|
137
|
|
138 This simple little type is used wherever a value might or might not be
|
|
139 present.
|
|
140 It is especially useful for function results and
|
|
141 rvalue reference arguments.
|
|
142 It corresponds directly to the optional elements in the productions
|
|
143 of the Fortran grammar.
|
|
144 It is also used as a wrapper around a parse tree node type to define the
|
|
145 results of the various parsing functions, where presence of a value
|
|
146 signifies a successful recognition and absence denotes a failed parse.
|
|
147 It is used in data structures in place of nullable pointers to
|
|
148 avoid indirection as well as the possible confusion over whether a pointer
|
|
149 is allowed to be null.
|