annotate llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl03.rst @ 266:00f31e85ec16 default tip

Added tag current for changeset 31d058e83c98
author Shinji KONO <kono@ie.u-ryukyu.ac.jp>
date Sat, 14 Oct 2023 10:13:55 +0900 (2023-10-14)
parents 1f2b6ac9f198
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
150
anatofuz
parents:
diff changeset
1 ========================================
anatofuz
parents:
diff changeset
2 Kaleidoscope: Code generation to LLVM IR
anatofuz
parents:
diff changeset
3 ========================================
anatofuz
parents:
diff changeset
4
anatofuz
parents:
diff changeset
5 .. contents::
anatofuz
parents:
diff changeset
6 :local:
anatofuz
parents:
diff changeset
7
anatofuz
parents:
diff changeset
8 Chapter 3 Introduction
anatofuz
parents:
diff changeset
9 ======================
anatofuz
parents:
diff changeset
10
anatofuz
parents:
diff changeset
11 Welcome to Chapter 3 of the "`Implementing a language with
anatofuz
parents:
diff changeset
12 LLVM <index.html>`_" tutorial. This chapter shows you how to transform
anatofuz
parents:
diff changeset
13 the `Abstract Syntax Tree <LangImpl02.html>`_, built in Chapter 2, into
anatofuz
parents:
diff changeset
14 LLVM IR. This will teach you a little bit about how LLVM does things, as
anatofuz
parents:
diff changeset
15 well as demonstrate how easy it is to use. It's much more work to build
anatofuz
parents:
diff changeset
16 a lexer and parser than it is to generate LLVM IR code. :)
anatofuz
parents:
diff changeset
17
anatofuz
parents:
diff changeset
18 **Please note**: the code in this chapter and later require LLVM 3.7 or
anatofuz
parents:
diff changeset
19 later. LLVM 3.6 and before will not work with it. Also note that you
anatofuz
parents:
diff changeset
20 need to use a version of this tutorial that matches your LLVM release:
anatofuz
parents:
diff changeset
21 If you are using an official LLVM release, use the version of the
anatofuz
parents:
diff changeset
22 documentation included with your release or on the `llvm.org releases
173
0572611fdcc8 reorgnization done
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 150
diff changeset
23 page <https://llvm.org/releases/>`_.
150
anatofuz
parents:
diff changeset
24
anatofuz
parents:
diff changeset
25 Code Generation Setup
anatofuz
parents:
diff changeset
26 =====================
anatofuz
parents:
diff changeset
27
anatofuz
parents:
diff changeset
28 In order to generate LLVM IR, we want some simple setup to get started.
anatofuz
parents:
diff changeset
29 First we define virtual code generation (codegen) methods in each AST
anatofuz
parents:
diff changeset
30 class:
anatofuz
parents:
diff changeset
31
anatofuz
parents:
diff changeset
32 .. code-block:: c++
anatofuz
parents:
diff changeset
33
anatofuz
parents:
diff changeset
34 /// ExprAST - Base class for all expression nodes.
anatofuz
parents:
diff changeset
35 class ExprAST {
anatofuz
parents:
diff changeset
36 public:
236
c4bab56944e8 LLVM 16
kono
parents: 221
diff changeset
37 virtual ~ExprAST() = default;
150
anatofuz
parents:
diff changeset
38 virtual Value *codegen() = 0;
anatofuz
parents:
diff changeset
39 };
anatofuz
parents:
diff changeset
40
anatofuz
parents:
diff changeset
41 /// NumberExprAST - Expression class for numeric literals like "1.0".
anatofuz
parents:
diff changeset
42 class NumberExprAST : public ExprAST {
anatofuz
parents:
diff changeset
43 double Val;
anatofuz
parents:
diff changeset
44
anatofuz
parents:
diff changeset
45 public:
anatofuz
parents:
diff changeset
46 NumberExprAST(double Val) : Val(Val) {}
236
c4bab56944e8 LLVM 16
kono
parents: 221
diff changeset
47 Value *codegen() override;
150
anatofuz
parents:
diff changeset
48 };
anatofuz
parents:
diff changeset
49 ...
anatofuz
parents:
diff changeset
50
anatofuz
parents:
diff changeset
51 The codegen() method says to emit IR for that AST node along with all
anatofuz
parents:
diff changeset
52 the things it depends on, and they all return an LLVM Value object.
anatofuz
parents:
diff changeset
53 "Value" is the class used to represent a "`Static Single Assignment
anatofuz
parents:
diff changeset
54 (SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
anatofuz
parents:
diff changeset
55 register" or "SSA value" in LLVM. The most distinct aspect of SSA values
anatofuz
parents:
diff changeset
56 is that their value is computed as the related instruction executes, and
anatofuz
parents:
diff changeset
57 it does not get a new value until (and if) the instruction re-executes.
anatofuz
parents:
diff changeset
58 In other words, there is no way to "change" an SSA value. For more
anatofuz
parents:
diff changeset
59 information, please read up on `Static Single
anatofuz
parents:
diff changeset
60 Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
anatofuz
parents:
diff changeset
61 - the concepts are really quite natural once you grok them.
anatofuz
parents:
diff changeset
62
anatofuz
parents:
diff changeset
63 Note that instead of adding virtual methods to the ExprAST class
anatofuz
parents:
diff changeset
64 hierarchy, it could also make sense to use a `visitor
anatofuz
parents:
diff changeset
65 pattern <http://en.wikipedia.org/wiki/Visitor_pattern>`_ or some other
anatofuz
parents:
diff changeset
66 way to model this. Again, this tutorial won't dwell on good software
anatofuz
parents:
diff changeset
67 engineering practices: for our purposes, adding a virtual method is
anatofuz
parents:
diff changeset
68 simplest.
anatofuz
parents:
diff changeset
69
221
79ff65ed7e25 LLVM12 Original
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 173
diff changeset
70 The second thing we want is a "LogError" method like we used for the
150
anatofuz
parents:
diff changeset
71 parser, which will be used to report errors found during code generation
anatofuz
parents:
diff changeset
72 (for example, use of an undeclared parameter):
anatofuz
parents:
diff changeset
73
anatofuz
parents:
diff changeset
74 .. code-block:: c++
anatofuz
parents:
diff changeset
75
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
76 static std::unique_ptr<LLVMContext> TheContext;
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
77 static std::unique_ptr<IRBuilder<>> Builder(TheContext);
150
anatofuz
parents:
diff changeset
78 static std::unique_ptr<Module> TheModule;
anatofuz
parents:
diff changeset
79 static std::map<std::string, Value *> NamedValues;
anatofuz
parents:
diff changeset
80
anatofuz
parents:
diff changeset
81 Value *LogErrorV(const char *Str) {
anatofuz
parents:
diff changeset
82 LogError(Str);
anatofuz
parents:
diff changeset
83 return nullptr;
anatofuz
parents:
diff changeset
84 }
anatofuz
parents:
diff changeset
85
anatofuz
parents:
diff changeset
86 The static variables will be used during code generation. ``TheContext``
anatofuz
parents:
diff changeset
87 is an opaque object that owns a lot of core LLVM data structures, such as
anatofuz
parents:
diff changeset
88 the type and constant value tables. We don't need to understand it in
anatofuz
parents:
diff changeset
89 detail, we just need a single instance to pass into APIs that require it.
anatofuz
parents:
diff changeset
90
anatofuz
parents:
diff changeset
91 The ``Builder`` object is a helper object that makes it easy to generate
anatofuz
parents:
diff changeset
92 LLVM instructions. Instances of the
221
79ff65ed7e25 LLVM12 Original
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 173
diff changeset
93 `IRBuilder <https://llvm.org/doxygen/IRBuilder_8h_source.html>`_
150
anatofuz
parents:
diff changeset
94 class template keep track of the current place to insert instructions
anatofuz
parents:
diff changeset
95 and has methods to create new instructions.
anatofuz
parents:
diff changeset
96
anatofuz
parents:
diff changeset
97 ``TheModule`` is an LLVM construct that contains functions and global
anatofuz
parents:
diff changeset
98 variables. In many ways, it is the top-level structure that the LLVM IR
anatofuz
parents:
diff changeset
99 uses to contain code. It will own the memory for all of the IR that we
anatofuz
parents:
diff changeset
100 generate, which is why the codegen() method returns a raw Value\*,
anatofuz
parents:
diff changeset
101 rather than a unique_ptr<Value>.
anatofuz
parents:
diff changeset
102
anatofuz
parents:
diff changeset
103 The ``NamedValues`` map keeps track of which values are defined in the
anatofuz
parents:
diff changeset
104 current scope and what their LLVM representation is. (In other words, it
anatofuz
parents:
diff changeset
105 is a symbol table for the code). In this form of Kaleidoscope, the only
anatofuz
parents:
diff changeset
106 things that can be referenced are function parameters. As such, function
anatofuz
parents:
diff changeset
107 parameters will be in this map when generating code for their function
anatofuz
parents:
diff changeset
108 body.
anatofuz
parents:
diff changeset
109
anatofuz
parents:
diff changeset
110 With these basics in place, we can start talking about how to generate
anatofuz
parents:
diff changeset
111 code for each expression. Note that this assumes that the ``Builder``
anatofuz
parents:
diff changeset
112 has been set up to generate code *into* something. For now, we'll assume
anatofuz
parents:
diff changeset
113 that this has already been done, and we'll just use it to emit code.
anatofuz
parents:
diff changeset
114
anatofuz
parents:
diff changeset
115 Expression Code Generation
anatofuz
parents:
diff changeset
116 ==========================
anatofuz
parents:
diff changeset
117
anatofuz
parents:
diff changeset
118 Generating LLVM code for expression nodes is very straightforward: less
anatofuz
parents:
diff changeset
119 than 45 lines of commented code for all four of our expression nodes.
anatofuz
parents:
diff changeset
120 First we'll do numeric literals:
anatofuz
parents:
diff changeset
121
anatofuz
parents:
diff changeset
122 .. code-block:: c++
anatofuz
parents:
diff changeset
123
anatofuz
parents:
diff changeset
124 Value *NumberExprAST::codegen() {
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
125 return ConstantFP::get(*TheContext, APFloat(Val));
150
anatofuz
parents:
diff changeset
126 }
anatofuz
parents:
diff changeset
127
anatofuz
parents:
diff changeset
128 In the LLVM IR, numeric constants are represented with the
anatofuz
parents:
diff changeset
129 ``ConstantFP`` class, which holds the numeric value in an ``APFloat``
anatofuz
parents:
diff changeset
130 internally (``APFloat`` has the capability of holding floating point
anatofuz
parents:
diff changeset
131 constants of Arbitrary Precision). This code basically just creates
anatofuz
parents:
diff changeset
132 and returns a ``ConstantFP``. Note that in the LLVM IR that constants
anatofuz
parents:
diff changeset
133 are all uniqued together and shared. For this reason, the API uses the
anatofuz
parents:
diff changeset
134 "foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)".
anatofuz
parents:
diff changeset
135
anatofuz
parents:
diff changeset
136 .. code-block:: c++
anatofuz
parents:
diff changeset
137
anatofuz
parents:
diff changeset
138 Value *VariableExprAST::codegen() {
anatofuz
parents:
diff changeset
139 // Look this variable up in the function.
anatofuz
parents:
diff changeset
140 Value *V = NamedValues[Name];
anatofuz
parents:
diff changeset
141 if (!V)
anatofuz
parents:
diff changeset
142 LogErrorV("Unknown variable name");
anatofuz
parents:
diff changeset
143 return V;
anatofuz
parents:
diff changeset
144 }
anatofuz
parents:
diff changeset
145
anatofuz
parents:
diff changeset
146 References to variables are also quite simple using LLVM. In the simple
anatofuz
parents:
diff changeset
147 version of Kaleidoscope, we assume that the variable has already been
anatofuz
parents:
diff changeset
148 emitted somewhere and its value is available. In practice, the only
anatofuz
parents:
diff changeset
149 values that can be in the ``NamedValues`` map are function arguments.
anatofuz
parents:
diff changeset
150 This code simply checks to see that the specified name is in the map (if
anatofuz
parents:
diff changeset
151 not, an unknown variable is being referenced) and returns the value for
anatofuz
parents:
diff changeset
152 it. In future chapters, we'll add support for `loop induction
anatofuz
parents:
diff changeset
153 variables <LangImpl05.html#for-loop-expression>`_ in the symbol table, and for `local
anatofuz
parents:
diff changeset
154 variables <LangImpl07.html#user-defined-local-variables>`_.
anatofuz
parents:
diff changeset
155
anatofuz
parents:
diff changeset
156 .. code-block:: c++
anatofuz
parents:
diff changeset
157
anatofuz
parents:
diff changeset
158 Value *BinaryExprAST::codegen() {
anatofuz
parents:
diff changeset
159 Value *L = LHS->codegen();
anatofuz
parents:
diff changeset
160 Value *R = RHS->codegen();
anatofuz
parents:
diff changeset
161 if (!L || !R)
anatofuz
parents:
diff changeset
162 return nullptr;
anatofuz
parents:
diff changeset
163
anatofuz
parents:
diff changeset
164 switch (Op) {
anatofuz
parents:
diff changeset
165 case '+':
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
166 return Builder->CreateFAdd(L, R, "addtmp");
150
anatofuz
parents:
diff changeset
167 case '-':
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
168 return Builder->CreateFSub(L, R, "subtmp");
150
anatofuz
parents:
diff changeset
169 case '*':
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
170 return Builder->CreateFMul(L, R, "multmp");
150
anatofuz
parents:
diff changeset
171 case '<':
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
172 L = Builder->CreateFCmpULT(L, R, "cmptmp");
150
anatofuz
parents:
diff changeset
173 // Convert bool 0/1 to double 0.0 or 1.0
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
174 return Builder->CreateUIToFP(L, Type::getDoubleTy(TheContext),
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
175 "booltmp");
150
anatofuz
parents:
diff changeset
176 default:
anatofuz
parents:
diff changeset
177 return LogErrorV("invalid binary operator");
anatofuz
parents:
diff changeset
178 }
anatofuz
parents:
diff changeset
179 }
anatofuz
parents:
diff changeset
180
anatofuz
parents:
diff changeset
181 Binary operators start to get more interesting. The basic idea here is
anatofuz
parents:
diff changeset
182 that we recursively emit code for the left-hand side of the expression,
anatofuz
parents:
diff changeset
183 then the right-hand side, then we compute the result of the binary
anatofuz
parents:
diff changeset
184 expression. In this code, we do a simple switch on the opcode to create
anatofuz
parents:
diff changeset
185 the right LLVM instruction.
anatofuz
parents:
diff changeset
186
anatofuz
parents:
diff changeset
187 In the example above, the LLVM builder class is starting to show its
anatofuz
parents:
diff changeset
188 value. IRBuilder knows where to insert the newly created instruction,
anatofuz
parents:
diff changeset
189 all you have to do is specify what instruction to create (e.g. with
anatofuz
parents:
diff changeset
190 ``CreateFAdd``), which operands to use (``L`` and ``R`` here) and
anatofuz
parents:
diff changeset
191 optionally provide a name for the generated instruction.
anatofuz
parents:
diff changeset
192
anatofuz
parents:
diff changeset
193 One nice thing about LLVM is that the name is just a hint. For instance,
anatofuz
parents:
diff changeset
194 if the code above emits multiple "addtmp" variables, LLVM will
anatofuz
parents:
diff changeset
195 automatically provide each one with an increasing, unique numeric
anatofuz
parents:
diff changeset
196 suffix. Local value names for instructions are purely optional, but it
anatofuz
parents:
diff changeset
197 makes it much easier to read the IR dumps.
anatofuz
parents:
diff changeset
198
anatofuz
parents:
diff changeset
199 `LLVM instructions <../../LangRef.html#instruction-reference>`_ are constrained by strict
236
c4bab56944e8 LLVM 16
kono
parents: 221
diff changeset
200 rules: for example, the Left and Right operands of an `add
150
anatofuz
parents:
diff changeset
201 instruction <../../LangRef.html#add-instruction>`_ must have the same type, and the
anatofuz
parents:
diff changeset
202 result type of the add must match the operand types. Because all values
anatofuz
parents:
diff changeset
203 in Kaleidoscope are doubles, this makes for very simple code for add,
anatofuz
parents:
diff changeset
204 sub and mul.
anatofuz
parents:
diff changeset
205
anatofuz
parents:
diff changeset
206 On the other hand, LLVM specifies that the `fcmp
anatofuz
parents:
diff changeset
207 instruction <../../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a
anatofuz
parents:
diff changeset
208 one bit integer). The problem with this is that Kaleidoscope wants the
anatofuz
parents:
diff changeset
209 value to be a 0.0 or 1.0 value. In order to get these semantics, we
anatofuz
parents:
diff changeset
210 combine the fcmp instruction with a `uitofp
anatofuz
parents:
diff changeset
211 instruction <../../LangRef.html#uitofp-to-instruction>`_. This instruction converts its
anatofuz
parents:
diff changeset
212 input integer into a floating point value by treating the input as an
anatofuz
parents:
diff changeset
213 unsigned value. In contrast, if we used the `sitofp
anatofuz
parents:
diff changeset
214 instruction <../../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator
anatofuz
parents:
diff changeset
215 would return 0.0 and -1.0, depending on the input value.
anatofuz
parents:
diff changeset
216
anatofuz
parents:
diff changeset
217 .. code-block:: c++
anatofuz
parents:
diff changeset
218
anatofuz
parents:
diff changeset
219 Value *CallExprAST::codegen() {
anatofuz
parents:
diff changeset
220 // Look up the name in the global module table.
anatofuz
parents:
diff changeset
221 Function *CalleeF = TheModule->getFunction(Callee);
anatofuz
parents:
diff changeset
222 if (!CalleeF)
anatofuz
parents:
diff changeset
223 return LogErrorV("Unknown function referenced");
anatofuz
parents:
diff changeset
224
anatofuz
parents:
diff changeset
225 // If argument mismatch error.
anatofuz
parents:
diff changeset
226 if (CalleeF->arg_size() != Args.size())
anatofuz
parents:
diff changeset
227 return LogErrorV("Incorrect # arguments passed");
anatofuz
parents:
diff changeset
228
anatofuz
parents:
diff changeset
229 std::vector<Value *> ArgsV;
anatofuz
parents:
diff changeset
230 for (unsigned i = 0, e = Args.size(); i != e; ++i) {
anatofuz
parents:
diff changeset
231 ArgsV.push_back(Args[i]->codegen());
anatofuz
parents:
diff changeset
232 if (!ArgsV.back())
anatofuz
parents:
diff changeset
233 return nullptr;
anatofuz
parents:
diff changeset
234 }
anatofuz
parents:
diff changeset
235
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
236 return Builder->CreateCall(CalleeF, ArgsV, "calltmp");
150
anatofuz
parents:
diff changeset
237 }
anatofuz
parents:
diff changeset
238
anatofuz
parents:
diff changeset
239 Code generation for function calls is quite straightforward with LLVM. The code
anatofuz
parents:
diff changeset
240 above initially does a function name lookup in the LLVM Module's symbol table.
anatofuz
parents:
diff changeset
241 Recall that the LLVM Module is the container that holds the functions we are
anatofuz
parents:
diff changeset
242 JIT'ing. By giving each function the same name as what the user specifies, we
anatofuz
parents:
diff changeset
243 can use the LLVM symbol table to resolve function names for us.
anatofuz
parents:
diff changeset
244
anatofuz
parents:
diff changeset
245 Once we have the function to call, we recursively codegen each argument
anatofuz
parents:
diff changeset
246 that is to be passed in, and create an LLVM `call
anatofuz
parents:
diff changeset
247 instruction <../../LangRef.html#call-instruction>`_. Note that LLVM uses the native C
anatofuz
parents:
diff changeset
248 calling conventions by default, allowing these calls to also call into
anatofuz
parents:
diff changeset
249 standard library functions like "sin" and "cos", with no additional
anatofuz
parents:
diff changeset
250 effort.
anatofuz
parents:
diff changeset
251
anatofuz
parents:
diff changeset
252 This wraps up our handling of the four basic expressions that we have so
anatofuz
parents:
diff changeset
253 far in Kaleidoscope. Feel free to go in and add some more. For example,
anatofuz
parents:
diff changeset
254 by browsing the `LLVM language reference <../../LangRef.html>`_ you'll find
anatofuz
parents:
diff changeset
255 several other interesting instructions that are really easy to plug into
anatofuz
parents:
diff changeset
256 our basic framework.
anatofuz
parents:
diff changeset
257
anatofuz
parents:
diff changeset
258 Function Code Generation
anatofuz
parents:
diff changeset
259 ========================
anatofuz
parents:
diff changeset
260
anatofuz
parents:
diff changeset
261 Code generation for prototypes and functions must handle a number of
anatofuz
parents:
diff changeset
262 details, which make their code less beautiful than expression code
anatofuz
parents:
diff changeset
263 generation, but allows us to illustrate some important points. First,
anatofuz
parents:
diff changeset
264 let's talk about code generation for prototypes: they are used both for
anatofuz
parents:
diff changeset
265 function bodies and external function declarations. The code starts
anatofuz
parents:
diff changeset
266 with:
anatofuz
parents:
diff changeset
267
anatofuz
parents:
diff changeset
268 .. code-block:: c++
anatofuz
parents:
diff changeset
269
anatofuz
parents:
diff changeset
270 Function *PrototypeAST::codegen() {
anatofuz
parents:
diff changeset
271 // Make the function type: double(double,double) etc.
anatofuz
parents:
diff changeset
272 std::vector<Type*> Doubles(Args.size(),
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
273 Type::getDoubleTy(*TheContext));
150
anatofuz
parents:
diff changeset
274 FunctionType *FT =
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
275 FunctionType::get(Type::getDoubleTy(*TheContext), Doubles, false);
150
anatofuz
parents:
diff changeset
276
anatofuz
parents:
diff changeset
277 Function *F =
anatofuz
parents:
diff changeset
278 Function::Create(FT, Function::ExternalLinkage, Name, TheModule.get());
anatofuz
parents:
diff changeset
279
anatofuz
parents:
diff changeset
280 This code packs a lot of power into a few lines. Note first that this
anatofuz
parents:
diff changeset
281 function returns a "Function\*" instead of a "Value\*". Because a
anatofuz
parents:
diff changeset
282 "prototype" really talks about the external interface for a function
anatofuz
parents:
diff changeset
283 (not the value computed by an expression), it makes sense for it to
anatofuz
parents:
diff changeset
284 return the LLVM Function it corresponds to when codegen'd.
anatofuz
parents:
diff changeset
285
anatofuz
parents:
diff changeset
286 The call to ``FunctionType::get`` creates the ``FunctionType`` that
anatofuz
parents:
diff changeset
287 should be used for a given Prototype. Since all function arguments in
anatofuz
parents:
diff changeset
288 Kaleidoscope are of type double, the first line creates a vector of "N"
anatofuz
parents:
diff changeset
289 LLVM double types. It then uses the ``Functiontype::get`` method to
anatofuz
parents:
diff changeset
290 create a function type that takes "N" doubles as arguments, returns one
anatofuz
parents:
diff changeset
291 double as a result, and that is not vararg (the false parameter
anatofuz
parents:
diff changeset
292 indicates this). Note that Types in LLVM are uniqued just like Constants
anatofuz
parents:
diff changeset
293 are, so you don't "new" a type, you "get" it.
anatofuz
parents:
diff changeset
294
anatofuz
parents:
diff changeset
295 The final line above actually creates the IR Function corresponding to
anatofuz
parents:
diff changeset
296 the Prototype. This indicates the type, linkage and name to use, as
anatofuz
parents:
diff changeset
297 well as which module to insert into. "`external
anatofuz
parents:
diff changeset
298 linkage <../../LangRef.html#linkage>`_" means that the function may be
anatofuz
parents:
diff changeset
299 defined outside the current module and/or that it is callable by
anatofuz
parents:
diff changeset
300 functions outside the module. The Name passed in is the name the user
anatofuz
parents:
diff changeset
301 specified: since "``TheModule``" is specified, this name is registered
anatofuz
parents:
diff changeset
302 in "``TheModule``"s symbol table.
anatofuz
parents:
diff changeset
303
anatofuz
parents:
diff changeset
304 .. code-block:: c++
anatofuz
parents:
diff changeset
305
anatofuz
parents:
diff changeset
306 // Set names for all arguments.
anatofuz
parents:
diff changeset
307 unsigned Idx = 0;
anatofuz
parents:
diff changeset
308 for (auto &Arg : F->args())
anatofuz
parents:
diff changeset
309 Arg.setName(Args[Idx++]);
anatofuz
parents:
diff changeset
310
anatofuz
parents:
diff changeset
311 return F;
anatofuz
parents:
diff changeset
312
anatofuz
parents:
diff changeset
313 Finally, we set the name of each of the function's arguments according to the
anatofuz
parents:
diff changeset
314 names given in the Prototype. This step isn't strictly necessary, but keeping
anatofuz
parents:
diff changeset
315 the names consistent makes the IR more readable, and allows subsequent code to
anatofuz
parents:
diff changeset
316 refer directly to the arguments for their names, rather than having to look up
anatofuz
parents:
diff changeset
317 them up in the Prototype AST.
anatofuz
parents:
diff changeset
318
anatofuz
parents:
diff changeset
319 At this point we have a function prototype with no body. This is how LLVM IR
anatofuz
parents:
diff changeset
320 represents function declarations. For extern statements in Kaleidoscope, this
anatofuz
parents:
diff changeset
321 is as far as we need to go. For function definitions however, we need to
anatofuz
parents:
diff changeset
322 codegen and attach a function body.
anatofuz
parents:
diff changeset
323
anatofuz
parents:
diff changeset
324 .. code-block:: c++
anatofuz
parents:
diff changeset
325
anatofuz
parents:
diff changeset
326 Function *FunctionAST::codegen() {
anatofuz
parents:
diff changeset
327 // First, check for an existing function from a previous 'extern' declaration.
anatofuz
parents:
diff changeset
328 Function *TheFunction = TheModule->getFunction(Proto->getName());
anatofuz
parents:
diff changeset
329
anatofuz
parents:
diff changeset
330 if (!TheFunction)
anatofuz
parents:
diff changeset
331 TheFunction = Proto->codegen();
anatofuz
parents:
diff changeset
332
anatofuz
parents:
diff changeset
333 if (!TheFunction)
anatofuz
parents:
diff changeset
334 return nullptr;
anatofuz
parents:
diff changeset
335
anatofuz
parents:
diff changeset
336 if (!TheFunction->empty())
anatofuz
parents:
diff changeset
337 return (Function*)LogErrorV("Function cannot be redefined.");
anatofuz
parents:
diff changeset
338
anatofuz
parents:
diff changeset
339
anatofuz
parents:
diff changeset
340 For function definitions, we start by searching TheModule's symbol table for an
anatofuz
parents:
diff changeset
341 existing version of this function, in case one has already been created using an
anatofuz
parents:
diff changeset
342 'extern' statement. If Module::getFunction returns null then no previous version
anatofuz
parents:
diff changeset
343 exists, so we'll codegen one from the Prototype. In either case, we want to
anatofuz
parents:
diff changeset
344 assert that the function is empty (i.e. has no body yet) before we start.
anatofuz
parents:
diff changeset
345
anatofuz
parents:
diff changeset
346 .. code-block:: c++
anatofuz
parents:
diff changeset
347
anatofuz
parents:
diff changeset
348 // Create a new basic block to start insertion into.
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
349 BasicBlock *BB = BasicBlock::Create(*TheContext, "entry", TheFunction);
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
350 Builder->SetInsertPoint(BB);
150
anatofuz
parents:
diff changeset
351
anatofuz
parents:
diff changeset
352 // Record the function arguments in the NamedValues map.
anatofuz
parents:
diff changeset
353 NamedValues.clear();
anatofuz
parents:
diff changeset
354 for (auto &Arg : TheFunction->args())
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
355 NamedValues[std::string(Arg.getName())] = &Arg;
150
anatofuz
parents:
diff changeset
356
anatofuz
parents:
diff changeset
357 Now we get to the point where the ``Builder`` is set up. The first line
anatofuz
parents:
diff changeset
358 creates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_
anatofuz
parents:
diff changeset
359 (named "entry"), which is inserted into ``TheFunction``. The second line
anatofuz
parents:
diff changeset
360 then tells the builder that new instructions should be inserted into the
anatofuz
parents:
diff changeset
361 end of the new basic block. Basic blocks in LLVM are an important part
anatofuz
parents:
diff changeset
362 of functions that define the `Control Flow
anatofuz
parents:
diff changeset
363 Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
anatofuz
parents:
diff changeset
364 don't have any control flow, our functions will only contain one block
anatofuz
parents:
diff changeset
365 at this point. We'll fix this in `Chapter 5 <LangImpl05.html>`_ :).
anatofuz
parents:
diff changeset
366
anatofuz
parents:
diff changeset
367 Next we add the function arguments to the NamedValues map (after first clearing
anatofuz
parents:
diff changeset
368 it out) so that they're accessible to ``VariableExprAST`` nodes.
anatofuz
parents:
diff changeset
369
anatofuz
parents:
diff changeset
370 .. code-block:: c++
anatofuz
parents:
diff changeset
371
anatofuz
parents:
diff changeset
372 if (Value *RetVal = Body->codegen()) {
anatofuz
parents:
diff changeset
373 // Finish off the function.
252
1f2b6ac9f198 LLVM16-1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 236
diff changeset
374 Builder->CreateRet(RetVal);
150
anatofuz
parents:
diff changeset
375
anatofuz
parents:
diff changeset
376 // Validate the generated code, checking for consistency.
anatofuz
parents:
diff changeset
377 verifyFunction(*TheFunction);
anatofuz
parents:
diff changeset
378
anatofuz
parents:
diff changeset
379 return TheFunction;
anatofuz
parents:
diff changeset
380 }
anatofuz
parents:
diff changeset
381
anatofuz
parents:
diff changeset
382 Once the insertion point has been set up and the NamedValues map populated,
anatofuz
parents:
diff changeset
383 we call the ``codegen()`` method for the root expression of the function. If no
anatofuz
parents:
diff changeset
384 error happens, this emits code to compute the expression into the entry block
anatofuz
parents:
diff changeset
385 and returns the value that was computed. Assuming no error, we then create an
anatofuz
parents:
diff changeset
386 LLVM `ret instruction <../../LangRef.html#ret-instruction>`_, which completes the function.
anatofuz
parents:
diff changeset
387 Once the function is built, we call ``verifyFunction``, which is
anatofuz
parents:
diff changeset
388 provided by LLVM. This function does a variety of consistency checks on
anatofuz
parents:
diff changeset
389 the generated code, to determine if our compiler is doing everything
anatofuz
parents:
diff changeset
390 right. Using this is important: it can catch a lot of bugs. Once the
anatofuz
parents:
diff changeset
391 function is finished and validated, we return it.
anatofuz
parents:
diff changeset
392
anatofuz
parents:
diff changeset
393 .. code-block:: c++
anatofuz
parents:
diff changeset
394
anatofuz
parents:
diff changeset
395 // Error reading body, remove function.
anatofuz
parents:
diff changeset
396 TheFunction->eraseFromParent();
anatofuz
parents:
diff changeset
397 return nullptr;
anatofuz
parents:
diff changeset
398 }
anatofuz
parents:
diff changeset
399
anatofuz
parents:
diff changeset
400 The only piece left here is handling of the error case. For simplicity,
anatofuz
parents:
diff changeset
401 we handle this by merely deleting the function we produced with the
anatofuz
parents:
diff changeset
402 ``eraseFromParent`` method. This allows the user to redefine a function
anatofuz
parents:
diff changeset
403 that they incorrectly typed in before: if we didn't delete it, it would
anatofuz
parents:
diff changeset
404 live in the symbol table, with a body, preventing future redefinition.
anatofuz
parents:
diff changeset
405
anatofuz
parents:
diff changeset
406 This code does have a bug, though: If the ``FunctionAST::codegen()`` method
anatofuz
parents:
diff changeset
407 finds an existing IR Function, it does not validate its signature against the
anatofuz
parents:
diff changeset
408 definition's own prototype. This means that an earlier 'extern' declaration will
anatofuz
parents:
diff changeset
409 take precedence over the function definition's signature, which can cause
anatofuz
parents:
diff changeset
410 codegen to fail, for instance if the function arguments are named differently.
anatofuz
parents:
diff changeset
411 There are a number of ways to fix this bug, see what you can come up with! Here
anatofuz
parents:
diff changeset
412 is a testcase:
anatofuz
parents:
diff changeset
413
anatofuz
parents:
diff changeset
414 ::
anatofuz
parents:
diff changeset
415
anatofuz
parents:
diff changeset
416 extern foo(a); # ok, defines foo.
anatofuz
parents:
diff changeset
417 def foo(b) b; # Error: Unknown variable name. (decl using 'a' takes precedence).
anatofuz
parents:
diff changeset
418
anatofuz
parents:
diff changeset
419 Driver Changes and Closing Thoughts
anatofuz
parents:
diff changeset
420 ===================================
anatofuz
parents:
diff changeset
421
anatofuz
parents:
diff changeset
422 For now, code generation to LLVM doesn't really get us much, except that
anatofuz
parents:
diff changeset
423 we can look at the pretty IR calls. The sample code inserts calls to
anatofuz
parents:
diff changeset
424 codegen into the "``HandleDefinition``", "``HandleExtern``" etc
anatofuz
parents:
diff changeset
425 functions, and then dumps out the LLVM IR. This gives a nice way to look
anatofuz
parents:
diff changeset
426 at the LLVM IR for simple functions. For example:
anatofuz
parents:
diff changeset
427
anatofuz
parents:
diff changeset
428 ::
anatofuz
parents:
diff changeset
429
anatofuz
parents:
diff changeset
430 ready> 4+5;
anatofuz
parents:
diff changeset
431 Read top-level expression:
anatofuz
parents:
diff changeset
432 define double @0() {
anatofuz
parents:
diff changeset
433 entry:
anatofuz
parents:
diff changeset
434 ret double 9.000000e+00
anatofuz
parents:
diff changeset
435 }
anatofuz
parents:
diff changeset
436
anatofuz
parents:
diff changeset
437 Note how the parser turns the top-level expression into anonymous
anatofuz
parents:
diff changeset
438 functions for us. This will be handy when we add `JIT
anatofuz
parents:
diff changeset
439 support <LangImpl04.html#adding-a-jit-compiler>`_ in the next chapter. Also note that the
anatofuz
parents:
diff changeset
440 code is very literally transcribed, no optimizations are being performed
anatofuz
parents:
diff changeset
441 except simple constant folding done by IRBuilder. We will `add
anatofuz
parents:
diff changeset
442 optimizations <LangImpl04.html#trivial-constant-folding>`_ explicitly in the next
anatofuz
parents:
diff changeset
443 chapter.
anatofuz
parents:
diff changeset
444
anatofuz
parents:
diff changeset
445 ::
anatofuz
parents:
diff changeset
446
anatofuz
parents:
diff changeset
447 ready> def foo(a b) a*a + 2*a*b + b*b;
anatofuz
parents:
diff changeset
448 Read function definition:
anatofuz
parents:
diff changeset
449 define double @foo(double %a, double %b) {
anatofuz
parents:
diff changeset
450 entry:
anatofuz
parents:
diff changeset
451 %multmp = fmul double %a, %a
anatofuz
parents:
diff changeset
452 %multmp1 = fmul double 2.000000e+00, %a
anatofuz
parents:
diff changeset
453 %multmp2 = fmul double %multmp1, %b
anatofuz
parents:
diff changeset
454 %addtmp = fadd double %multmp, %multmp2
anatofuz
parents:
diff changeset
455 %multmp3 = fmul double %b, %b
anatofuz
parents:
diff changeset
456 %addtmp4 = fadd double %addtmp, %multmp3
anatofuz
parents:
diff changeset
457 ret double %addtmp4
anatofuz
parents:
diff changeset
458 }
anatofuz
parents:
diff changeset
459
anatofuz
parents:
diff changeset
460 This shows some simple arithmetic. Notice the striking similarity to the
anatofuz
parents:
diff changeset
461 LLVM builder calls that we use to create the instructions.
anatofuz
parents:
diff changeset
462
anatofuz
parents:
diff changeset
463 ::
anatofuz
parents:
diff changeset
464
anatofuz
parents:
diff changeset
465 ready> def bar(a) foo(a, 4.0) + bar(31337);
anatofuz
parents:
diff changeset
466 Read function definition:
anatofuz
parents:
diff changeset
467 define double @bar(double %a) {
anatofuz
parents:
diff changeset
468 entry:
anatofuz
parents:
diff changeset
469 %calltmp = call double @foo(double %a, double 4.000000e+00)
anatofuz
parents:
diff changeset
470 %calltmp1 = call double @bar(double 3.133700e+04)
anatofuz
parents:
diff changeset
471 %addtmp = fadd double %calltmp, %calltmp1
anatofuz
parents:
diff changeset
472 ret double %addtmp
anatofuz
parents:
diff changeset
473 }
anatofuz
parents:
diff changeset
474
anatofuz
parents:
diff changeset
475 This shows some function calls. Note that this function will take a long
anatofuz
parents:
diff changeset
476 time to execute if you call it. In the future we'll add conditional
anatofuz
parents:
diff changeset
477 control flow to actually make recursion useful :).
anatofuz
parents:
diff changeset
478
anatofuz
parents:
diff changeset
479 ::
anatofuz
parents:
diff changeset
480
anatofuz
parents:
diff changeset
481 ready> extern cos(x);
anatofuz
parents:
diff changeset
482 Read extern:
anatofuz
parents:
diff changeset
483 declare double @cos(double)
anatofuz
parents:
diff changeset
484
anatofuz
parents:
diff changeset
485 ready> cos(1.234);
anatofuz
parents:
diff changeset
486 Read top-level expression:
anatofuz
parents:
diff changeset
487 define double @1() {
anatofuz
parents:
diff changeset
488 entry:
anatofuz
parents:
diff changeset
489 %calltmp = call double @cos(double 1.234000e+00)
anatofuz
parents:
diff changeset
490 ret double %calltmp
anatofuz
parents:
diff changeset
491 }
anatofuz
parents:
diff changeset
492
anatofuz
parents:
diff changeset
493 This shows an extern for the libm "cos" function, and a call to it.
anatofuz
parents:
diff changeset
494
anatofuz
parents:
diff changeset
495 .. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up
anatofuz
parents:
diff changeset
496 on highlighting this due to the first line.
anatofuz
parents:
diff changeset
497
anatofuz
parents:
diff changeset
498 ::
anatofuz
parents:
diff changeset
499
anatofuz
parents:
diff changeset
500 ready> ^D
anatofuz
parents:
diff changeset
501 ; ModuleID = 'my cool jit'
anatofuz
parents:
diff changeset
502
anatofuz
parents:
diff changeset
503 define double @0() {
anatofuz
parents:
diff changeset
504 entry:
anatofuz
parents:
diff changeset
505 %addtmp = fadd double 4.000000e+00, 5.000000e+00
anatofuz
parents:
diff changeset
506 ret double %addtmp
anatofuz
parents:
diff changeset
507 }
anatofuz
parents:
diff changeset
508
anatofuz
parents:
diff changeset
509 define double @foo(double %a, double %b) {
anatofuz
parents:
diff changeset
510 entry:
anatofuz
parents:
diff changeset
511 %multmp = fmul double %a, %a
anatofuz
parents:
diff changeset
512 %multmp1 = fmul double 2.000000e+00, %a
anatofuz
parents:
diff changeset
513 %multmp2 = fmul double %multmp1, %b
anatofuz
parents:
diff changeset
514 %addtmp = fadd double %multmp, %multmp2
anatofuz
parents:
diff changeset
515 %multmp3 = fmul double %b, %b
anatofuz
parents:
diff changeset
516 %addtmp4 = fadd double %addtmp, %multmp3
anatofuz
parents:
diff changeset
517 ret double %addtmp4
anatofuz
parents:
diff changeset
518 }
anatofuz
parents:
diff changeset
519
anatofuz
parents:
diff changeset
520 define double @bar(double %a) {
anatofuz
parents:
diff changeset
521 entry:
anatofuz
parents:
diff changeset
522 %calltmp = call double @foo(double %a, double 4.000000e+00)
anatofuz
parents:
diff changeset
523 %calltmp1 = call double @bar(double 3.133700e+04)
anatofuz
parents:
diff changeset
524 %addtmp = fadd double %calltmp, %calltmp1
anatofuz
parents:
diff changeset
525 ret double %addtmp
anatofuz
parents:
diff changeset
526 }
anatofuz
parents:
diff changeset
527
anatofuz
parents:
diff changeset
528 declare double @cos(double)
anatofuz
parents:
diff changeset
529
anatofuz
parents:
diff changeset
530 define double @1() {
anatofuz
parents:
diff changeset
531 entry:
anatofuz
parents:
diff changeset
532 %calltmp = call double @cos(double 1.234000e+00)
anatofuz
parents:
diff changeset
533 ret double %calltmp
anatofuz
parents:
diff changeset
534 }
anatofuz
parents:
diff changeset
535
anatofuz
parents:
diff changeset
536 When you quit the current demo (by sending an EOF via CTRL+D on Linux
anatofuz
parents:
diff changeset
537 or CTRL+Z and ENTER on Windows), it dumps out the IR for the entire
anatofuz
parents:
diff changeset
538 module generated. Here you can see the big picture with all the
anatofuz
parents:
diff changeset
539 functions referencing each other.
anatofuz
parents:
diff changeset
540
anatofuz
parents:
diff changeset
541 This wraps up the third chapter of the Kaleidoscope tutorial. Up next,
anatofuz
parents:
diff changeset
542 we'll describe how to `add JIT codegen and optimizer
anatofuz
parents:
diff changeset
543 support <LangImpl04.html>`_ to this so we can actually start running
anatofuz
parents:
diff changeset
544 code!
anatofuz
parents:
diff changeset
545
anatofuz
parents:
diff changeset
546 Full Code Listing
anatofuz
parents:
diff changeset
547 =================
anatofuz
parents:
diff changeset
548
anatofuz
parents:
diff changeset
549 Here is the complete code listing for our running example, enhanced with
anatofuz
parents:
diff changeset
550 the LLVM code generator. Because this uses the LLVM libraries, we need
anatofuz
parents:
diff changeset
551 to link them in. To do this, we use the
173
0572611fdcc8 reorgnization done
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 150
diff changeset
552 `llvm-config <https://llvm.org/cmds/llvm-config.html>`_ tool to inform
150
anatofuz
parents:
diff changeset
553 our makefile/command line about which options to use:
anatofuz
parents:
diff changeset
554
anatofuz
parents:
diff changeset
555 .. code-block:: bash
anatofuz
parents:
diff changeset
556
anatofuz
parents:
diff changeset
557 # Compile
anatofuz
parents:
diff changeset
558 clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy
anatofuz
parents:
diff changeset
559 # Run
anatofuz
parents:
diff changeset
560 ./toy
anatofuz
parents:
diff changeset
561
anatofuz
parents:
diff changeset
562 Here is the code:
anatofuz
parents:
diff changeset
563
anatofuz
parents:
diff changeset
564 .. literalinclude:: ../../../examples/Kaleidoscope/Chapter3/toy.cpp
anatofuz
parents:
diff changeset
565 :language: c++
anatofuz
parents:
diff changeset
566
anatofuz
parents:
diff changeset
567 `Next: Adding JIT and Optimizer Support <LangImpl04.html>`_
anatofuz
parents:
diff changeset
568