Mercurial > hg > CbC > CbC_llvm
comparison docs/tutorial/LangImpl7.rst @ 31:d22a1cf4041c
merge with the LLVM_original
author | Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> |
---|---|
date | Thu, 12 Dec 2013 14:37:49 +0900 |
parents | 9ad51c7bc036 |
children |
comparison
equal
deleted
inserted
replaced
2:4bc3e1cd2659 | 31:d22a1cf4041c |
---|---|
1 ======================================================= | |
2 Kaleidoscope: Extending the Language: Mutable Variables | |
3 ======================================================= | |
4 | |
5 .. contents:: | |
6 :local: | |
7 | |
8 Chapter 7 Introduction | |
9 ====================== | |
10 | |
11 Welcome to Chapter 7 of the "`Implementing a language with | |
12 LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a | |
13 very respectable, albeit simple, `functional programming | |
14 language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our | |
15 journey, we learned some parsing techniques, how to build and represent | |
16 an AST, how to build LLVM IR, and how to optimize the resultant code as | |
17 well as JIT compile it. | |
18 | |
19 While Kaleidoscope is interesting as a functional language, the fact | |
20 that it is functional makes it "too easy" to generate LLVM IR for it. In | |
21 particular, a functional language makes it very easy to build LLVM IR | |
22 directly in `SSA | |
23 form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. | |
24 Since LLVM requires that the input code be in SSA form, this is a very | |
25 nice property and it is often unclear to newcomers how to generate code | |
26 for an imperative language with mutable variables. | |
27 | |
28 The short (and happy) summary of this chapter is that there is no need | |
29 for your front-end to build SSA form: LLVM provides highly tuned and | |
30 well tested support for this, though the way it works is a bit | |
31 unexpected for some. | |
32 | |
33 Why is this a hard problem? | |
34 =========================== | |
35 | |
36 To understand why mutable variables cause complexities in SSA | |
37 construction, consider this extremely simple C example: | |
38 | |
39 .. code-block:: c | |
40 | |
41 int G, H; | |
42 int test(_Bool Condition) { | |
43 int X; | |
44 if (Condition) | |
45 X = G; | |
46 else | |
47 X = H; | |
48 return X; | |
49 } | |
50 | |
51 In this case, we have the variable "X", whose value depends on the path | |
52 executed in the program. Because there are two different possible values | |
53 for X before the return instruction, a PHI node is inserted to merge the | |
54 two values. The LLVM IR that we want for this example looks like this: | |
55 | |
56 .. code-block:: llvm | |
57 | |
58 @G = weak global i32 0 ; type of @G is i32* | |
59 @H = weak global i32 0 ; type of @H is i32* | |
60 | |
61 define i32 @test(i1 %Condition) { | |
62 entry: | |
63 br i1 %Condition, label %cond_true, label %cond_false | |
64 | |
65 cond_true: | |
66 %X.0 = load i32* @G | |
67 br label %cond_next | |
68 | |
69 cond_false: | |
70 %X.1 = load i32* @H | |
71 br label %cond_next | |
72 | |
73 cond_next: | |
74 %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] | |
75 ret i32 %X.2 | |
76 } | |
77 | |
78 In this example, the loads from the G and H global variables are | |
79 explicit in the LLVM IR, and they live in the then/else branches of the | |
80 if statement (cond\_true/cond\_false). In order to merge the incoming | |
81 values, the X.2 phi node in the cond\_next block selects the right value | |
82 to use based on where control flow is coming from: if control flow comes | |
83 from the cond\_false block, X.2 gets the value of X.1. Alternatively, if | |
84 control flow comes from cond\_true, it gets the value of X.0. The intent | |
85 of this chapter is not to explain the details of SSA form. For more | |
86 information, see one of the many `online | |
87 references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. | |
88 | |
89 The question for this article is "who places the phi nodes when lowering | |
90 assignments to mutable variables?". The issue here is that LLVM | |
91 *requires* that its IR be in SSA form: there is no "non-ssa" mode for | |
92 it. However, SSA construction requires non-trivial algorithms and data | |
93 structures, so it is inconvenient and wasteful for every front-end to | |
94 have to reproduce this logic. | |
95 | |
96 Memory in LLVM | |
97 ============== | |
98 | |
99 The 'trick' here is that while LLVM does require all register values to | |
100 be in SSA form, it does not require (or permit) memory objects to be in | |
101 SSA form. In the example above, note that the loads from G and H are | |
102 direct accesses to G and H: they are not renamed or versioned. This | |
103 differs from some other compiler systems, which do try to version memory | |
104 objects. In LLVM, instead of encoding dataflow analysis of memory into | |
105 the LLVM IR, it is handled with `Analysis | |
106 Passes <../WritingAnLLVMPass.html>`_ which are computed on demand. | |
107 | |
108 With this in mind, the high-level idea is that we want to make a stack | |
109 variable (which lives in memory, because it is on the stack) for each | |
110 mutable object in a function. To take advantage of this trick, we need | |
111 to talk about how LLVM represents stack variables. | |
112 | |
113 In LLVM, all memory accesses are explicit with load/store instructions, | |
114 and it is carefully designed not to have (or need) an "address-of" | |
115 operator. Notice how the type of the @G/@H global variables is actually | |
116 "i32\*" even though the variable is defined as "i32". What this means is | |
117 that @G defines *space* for an i32 in the global data area, but its | |
118 *name* actually refers to the address for that space. Stack variables | |
119 work the same way, except that instead of being declared with global | |
120 variable definitions, they are declared with the `LLVM alloca | |
121 instruction <../LangRef.html#i_alloca>`_: | |
122 | |
123 .. code-block:: llvm | |
124 | |
125 define i32 @example() { | |
126 entry: | |
127 %X = alloca i32 ; type of %X is i32*. | |
128 ... | |
129 %tmp = load i32* %X ; load the stack value %X from the stack. | |
130 %tmp2 = add i32 %tmp, 1 ; increment it | |
131 store i32 %tmp2, i32* %X ; store it back | |
132 ... | |
133 | |
134 This code shows an example of how you can declare and manipulate a stack | |
135 variable in the LLVM IR. Stack memory allocated with the alloca | |
136 instruction is fully general: you can pass the address of the stack slot | |
137 to functions, you can store it in other variables, etc. In our example | |
138 above, we could rewrite the example to use the alloca technique to avoid | |
139 using a PHI node: | |
140 | |
141 .. code-block:: llvm | |
142 | |
143 @G = weak global i32 0 ; type of @G is i32* | |
144 @H = weak global i32 0 ; type of @H is i32* | |
145 | |
146 define i32 @test(i1 %Condition) { | |
147 entry: | |
148 %X = alloca i32 ; type of %X is i32*. | |
149 br i1 %Condition, label %cond_true, label %cond_false | |
150 | |
151 cond_true: | |
152 %X.0 = load i32* @G | |
153 store i32 %X.0, i32* %X ; Update X | |
154 br label %cond_next | |
155 | |
156 cond_false: | |
157 %X.1 = load i32* @H | |
158 store i32 %X.1, i32* %X ; Update X | |
159 br label %cond_next | |
160 | |
161 cond_next: | |
162 %X.2 = load i32* %X ; Read X | |
163 ret i32 %X.2 | |
164 } | |
165 | |
166 With this, we have discovered a way to handle arbitrary mutable | |
167 variables without the need to create Phi nodes at all: | |
168 | |
169 #. Each mutable variable becomes a stack allocation. | |
170 #. Each read of the variable becomes a load from the stack. | |
171 #. Each update of the variable becomes a store to the stack. | |
172 #. Taking the address of a variable just uses the stack address | |
173 directly. | |
174 | |
175 While this solution has solved our immediate problem, it introduced | |
176 another one: we have now apparently introduced a lot of stack traffic | |
177 for very simple and common operations, a major performance problem. | |
178 Fortunately for us, the LLVM optimizer has a highly-tuned optimization | |
179 pass named "mem2reg" that handles this case, promoting allocas like this | |
180 into SSA registers, inserting Phi nodes as appropriate. If you run this | |
181 example through the pass, for example, you'll get: | |
182 | |
183 .. code-block:: bash | |
184 | |
185 $ llvm-as < example.ll | opt -mem2reg | llvm-dis | |
186 @G = weak global i32 0 | |
187 @H = weak global i32 0 | |
188 | |
189 define i32 @test(i1 %Condition) { | |
190 entry: | |
191 br i1 %Condition, label %cond_true, label %cond_false | |
192 | |
193 cond_true: | |
194 %X.0 = load i32* @G | |
195 br label %cond_next | |
196 | |
197 cond_false: | |
198 %X.1 = load i32* @H | |
199 br label %cond_next | |
200 | |
201 cond_next: | |
202 %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] | |
203 ret i32 %X.01 | |
204 } | |
205 | |
206 The mem2reg pass implements the standard "iterated dominance frontier" | |
207 algorithm for constructing SSA form and has a number of optimizations | |
208 that speed up (very common) degenerate cases. The mem2reg optimization | |
209 pass is the answer to dealing with mutable variables, and we highly | |
210 recommend that you depend on it. Note that mem2reg only works on | |
211 variables in certain circumstances: | |
212 | |
213 #. mem2reg is alloca-driven: it looks for allocas and if it can handle | |
214 them, it promotes them. It does not apply to global variables or heap | |
215 allocations. | |
216 #. mem2reg only looks for alloca instructions in the entry block of the | |
217 function. Being in the entry block guarantees that the alloca is only | |
218 executed once, which makes analysis simpler. | |
219 #. mem2reg only promotes allocas whose uses are direct loads and stores. | |
220 If the address of the stack object is passed to a function, or if any | |
221 funny pointer arithmetic is involved, the alloca will not be | |
222 promoted. | |
223 #. mem2reg only works on allocas of `first | |
224 class <../LangRef.html#t_classifications>`_ values (such as pointers, | |
225 scalars and vectors), and only if the array size of the allocation is | |
226 1 (or missing in the .ll file). mem2reg is not capable of promoting | |
227 structs or arrays to registers. Note that the "scalarrepl" pass is | |
228 more powerful and can promote structs, "unions", and arrays in many | |
229 cases. | |
230 | |
231 All of these properties are easy to satisfy for most imperative | |
232 languages, and we'll illustrate it below with Kaleidoscope. The final | |
233 question you may be asking is: should I bother with this nonsense for my | |
234 front-end? Wouldn't it be better if I just did SSA construction | |
235 directly, avoiding use of the mem2reg optimization pass? In short, we | |
236 strongly recommend that you use this technique for building SSA form, | |
237 unless there is an extremely good reason not to. Using this technique | |
238 is: | |
239 | |
240 - Proven and well tested: llvm-gcc and clang both use this technique | |
241 for local mutable variables. As such, the most common clients of LLVM | |
242 are using this to handle a bulk of their variables. You can be sure | |
243 that bugs are found fast and fixed early. | |
244 - Extremely Fast: mem2reg has a number of special cases that make it | |
245 fast in common cases as well as fully general. For example, it has | |
246 fast-paths for variables that are only used in a single block, | |
247 variables that only have one assignment point, good heuristics to | |
248 avoid insertion of unneeded phi nodes, etc. | |
249 - Needed for debug info generation: `Debug information in | |
250 LLVM <../SourceLevelDebugging.html>`_ relies on having the address of | |
251 the variable exposed so that debug info can be attached to it. This | |
252 technique dovetails very naturally with this style of debug info. | |
253 | |
254 If nothing else, this makes it much easier to get your front-end up and | |
255 running, and is very simple to implement. Lets extend Kaleidoscope with | |
256 mutable variables now! | |
257 | |
258 Mutable Variables in Kaleidoscope | |
259 ================================= | |
260 | |
261 Now that we know the sort of problem we want to tackle, lets see what | |
262 this looks like in the context of our little Kaleidoscope language. | |
263 We're going to add two features: | |
264 | |
265 #. The ability to mutate variables with the '=' operator. | |
266 #. The ability to define new variables. | |
267 | |
268 While the first item is really what this is about, we only have | |
269 variables for incoming arguments as well as for induction variables, and | |
270 redefining those only goes so far :). Also, the ability to define new | |
271 variables is a useful thing regardless of whether you will be mutating | |
272 them. Here's a motivating example that shows how we could use these: | |
273 | |
274 :: | |
275 | |
276 # Define ':' for sequencing: as a low-precedence operator that ignores operands | |
277 # and just returns the RHS. | |
278 def binary : 1 (x y) y; | |
279 | |
280 # Recursive fib, we could do this before. | |
281 def fib(x) | |
282 if (x < 3) then | |
283 1 | |
284 else | |
285 fib(x-1)+fib(x-2); | |
286 | |
287 # Iterative fib. | |
288 def fibi(x) | |
289 var a = 1, b = 1, c in | |
290 (for i = 3, i < x in | |
291 c = a + b : | |
292 a = b : | |
293 b = c) : | |
294 b; | |
295 | |
296 # Call it. | |
297 fibi(10); | |
298 | |
299 In order to mutate variables, we have to change our existing variables | |
300 to use the "alloca trick". Once we have that, we'll add our new | |
301 operator, then extend Kaleidoscope to support new variable definitions. | |
302 | |
303 Adjusting Existing Variables for Mutation | |
304 ========================================= | |
305 | |
306 The symbol table in Kaleidoscope is managed at code generation time by | |
307 the '``NamedValues``' map. This map currently keeps track of the LLVM | |
308 "Value\*" that holds the double value for the named variable. In order | |
309 to support mutation, we need to change this slightly, so that it | |
310 ``NamedValues`` holds the *memory location* of the variable in question. | |
311 Note that this change is a refactoring: it changes the structure of the | |
312 code, but does not (by itself) change the behavior of the compiler. All | |
313 of these changes are isolated in the Kaleidoscope code generator. | |
314 | |
315 At this point in Kaleidoscope's development, it only supports variables | |
316 for two things: incoming arguments to functions and the induction | |
317 variable of 'for' loops. For consistency, we'll allow mutation of these | |
318 variables in addition to other user-defined variables. This means that | |
319 these will both need memory locations. | |
320 | |
321 To start our transformation of Kaleidoscope, we'll change the | |
322 NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once | |
323 we do this, the C++ compiler will tell us what parts of the code we need | |
324 to update: | |
325 | |
326 .. code-block:: c++ | |
327 | |
328 static std::map<std::string, AllocaInst*> NamedValues; | |
329 | |
330 Also, since we will need to create these alloca's, we'll use a helper | |
331 function that ensures that the allocas are created in the entry block of | |
332 the function: | |
333 | |
334 .. code-block:: c++ | |
335 | |
336 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of | |
337 /// the function. This is used for mutable variables etc. | |
338 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, | |
339 const std::string &VarName) { | |
340 IRBuilder<> TmpB(&TheFunction->getEntryBlock(), | |
341 TheFunction->getEntryBlock().begin()); | |
342 return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, | |
343 VarName.c_str()); | |
344 } | |
345 | |
346 This funny looking code creates an IRBuilder object that is pointing at | |
347 the first instruction (.begin()) of the entry block. It then creates an | |
348 alloca with the expected name and returns it. Because all values in | |
349 Kaleidoscope are doubles, there is no need to pass in a type to use. | |
350 | |
351 With this in place, the first functionality change we want to make is to | |
352 variable references. In our new scheme, variables live on the stack, so | |
353 code generating a reference to them actually needs to produce a load | |
354 from the stack slot: | |
355 | |
356 .. code-block:: c++ | |
357 | |
358 Value *VariableExprAST::Codegen() { | |
359 // Look this variable up in the function. | |
360 Value *V = NamedValues[Name]; | |
361 if (V == 0) return ErrorV("Unknown variable name"); | |
362 | |
363 // Load the value. | |
364 return Builder.CreateLoad(V, Name.c_str()); | |
365 } | |
366 | |
367 As you can see, this is pretty straightforward. Now we need to update | |
368 the things that define the variables to set up the alloca. We'll start | |
369 with ``ForExprAST::Codegen`` (see the `full code listing <#code>`_ for | |
370 the unabridged code): | |
371 | |
372 .. code-block:: c++ | |
373 | |
374 Function *TheFunction = Builder.GetInsertBlock()->getParent(); | |
375 | |
376 // Create an alloca for the variable in the entry block. | |
377 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); | |
378 | |
379 // Emit the start code first, without 'variable' in scope. | |
380 Value *StartVal = Start->Codegen(); | |
381 if (StartVal == 0) return 0; | |
382 | |
383 // Store the value into the alloca. | |
384 Builder.CreateStore(StartVal, Alloca); | |
385 ... | |
386 | |
387 // Compute the end condition. | |
388 Value *EndCond = End->Codegen(); | |
389 if (EndCond == 0) return EndCond; | |
390 | |
391 // Reload, increment, and restore the alloca. This handles the case where | |
392 // the body of the loop mutates the variable. | |
393 Value *CurVar = Builder.CreateLoad(Alloca); | |
394 Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); | |
395 Builder.CreateStore(NextVar, Alloca); | |
396 ... | |
397 | |
398 This code is virtually identical to the code `before we allowed mutable | |
399 variables <LangImpl5.html#forcodegen>`_. The big difference is that we | |
400 no longer have to construct a PHI node, and we use load/store to access | |
401 the variable as needed. | |
402 | |
403 To support mutable argument variables, we need to also make allocas for | |
404 them. The code for this is also pretty simple: | |
405 | |
406 .. code-block:: c++ | |
407 | |
408 /// CreateArgumentAllocas - Create an alloca for each argument and register the | |
409 /// argument in the symbol table so that references to it will succeed. | |
410 void PrototypeAST::CreateArgumentAllocas(Function *F) { | |
411 Function::arg_iterator AI = F->arg_begin(); | |
412 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { | |
413 // Create an alloca for this variable. | |
414 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); | |
415 | |
416 // Store the initial value into the alloca. | |
417 Builder.CreateStore(AI, Alloca); | |
418 | |
419 // Add arguments to variable symbol table. | |
420 NamedValues[Args[Idx]] = Alloca; | |
421 } | |
422 } | |
423 | |
424 For each argument, we make an alloca, store the input value to the | |
425 function into the alloca, and register the alloca as the memory location | |
426 for the argument. This method gets invoked by ``FunctionAST::Codegen`` | |
427 right after it sets up the entry block for the function. | |
428 | |
429 The final missing piece is adding the mem2reg pass, which allows us to | |
430 get good codegen once again: | |
431 | |
432 .. code-block:: c++ | |
433 | |
434 // Set up the optimizer pipeline. Start with registering info about how the | |
435 // target lays out data structures. | |
436 OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); | |
437 // Promote allocas to registers. | |
438 OurFPM.add(createPromoteMemoryToRegisterPass()); | |
439 // Do simple "peephole" optimizations and bit-twiddling optzns. | |
440 OurFPM.add(createInstructionCombiningPass()); | |
441 // Reassociate expressions. | |
442 OurFPM.add(createReassociatePass()); | |
443 | |
444 It is interesting to see what the code looks like before and after the | |
445 mem2reg optimization runs. For example, this is the before/after code | |
446 for our recursive fib function. Before the optimization: | |
447 | |
448 .. code-block:: llvm | |
449 | |
450 define double @fib(double %x) { | |
451 entry: | |
452 %x1 = alloca double | |
453 store double %x, double* %x1 | |
454 %x2 = load double* %x1 | |
455 %cmptmp = fcmp ult double %x2, 3.000000e+00 | |
456 %booltmp = uitofp i1 %cmptmp to double | |
457 %ifcond = fcmp one double %booltmp, 0.000000e+00 | |
458 br i1 %ifcond, label %then, label %else | |
459 | |
460 then: ; preds = %entry | |
461 br label %ifcont | |
462 | |
463 else: ; preds = %entry | |
464 %x3 = load double* %x1 | |
465 %subtmp = fsub double %x3, 1.000000e+00 | |
466 %calltmp = call double @fib(double %subtmp) | |
467 %x4 = load double* %x1 | |
468 %subtmp5 = fsub double %x4, 2.000000e+00 | |
469 %calltmp6 = call double @fib(double %subtmp5) | |
470 %addtmp = fadd double %calltmp, %calltmp6 | |
471 br label %ifcont | |
472 | |
473 ifcont: ; preds = %else, %then | |
474 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] | |
475 ret double %iftmp | |
476 } | |
477 | |
478 Here there is only one variable (x, the input argument) but you can | |
479 still see the extremely simple-minded code generation strategy we are | |
480 using. In the entry block, an alloca is created, and the initial input | |
481 value is stored into it. Each reference to the variable does a reload | |
482 from the stack. Also, note that we didn't modify the if/then/else | |
483 expression, so it still inserts a PHI node. While we could make an | |
484 alloca for it, it is actually easier to create a PHI node for it, so we | |
485 still just make the PHI. | |
486 | |
487 Here is the code after the mem2reg pass runs: | |
488 | |
489 .. code-block:: llvm | |
490 | |
491 define double @fib(double %x) { | |
492 entry: | |
493 %cmptmp = fcmp ult double %x, 3.000000e+00 | |
494 %booltmp = uitofp i1 %cmptmp to double | |
495 %ifcond = fcmp one double %booltmp, 0.000000e+00 | |
496 br i1 %ifcond, label %then, label %else | |
497 | |
498 then: | |
499 br label %ifcont | |
500 | |
501 else: | |
502 %subtmp = fsub double %x, 1.000000e+00 | |
503 %calltmp = call double @fib(double %subtmp) | |
504 %subtmp5 = fsub double %x, 2.000000e+00 | |
505 %calltmp6 = call double @fib(double %subtmp5) | |
506 %addtmp = fadd double %calltmp, %calltmp6 | |
507 br label %ifcont | |
508 | |
509 ifcont: ; preds = %else, %then | |
510 %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] | |
511 ret double %iftmp | |
512 } | |
513 | |
514 This is a trivial case for mem2reg, since there are no redefinitions of | |
515 the variable. The point of showing this is to calm your tension about | |
516 inserting such blatent inefficiencies :). | |
517 | |
518 After the rest of the optimizers run, we get: | |
519 | |
520 .. code-block:: llvm | |
521 | |
522 define double @fib(double %x) { | |
523 entry: | |
524 %cmptmp = fcmp ult double %x, 3.000000e+00 | |
525 %booltmp = uitofp i1 %cmptmp to double | |
526 %ifcond = fcmp ueq double %booltmp, 0.000000e+00 | |
527 br i1 %ifcond, label %else, label %ifcont | |
528 | |
529 else: | |
530 %subtmp = fsub double %x, 1.000000e+00 | |
531 %calltmp = call double @fib(double %subtmp) | |
532 %subtmp5 = fsub double %x, 2.000000e+00 | |
533 %calltmp6 = call double @fib(double %subtmp5) | |
534 %addtmp = fadd double %calltmp, %calltmp6 | |
535 ret double %addtmp | |
536 | |
537 ifcont: | |
538 ret double 1.000000e+00 | |
539 } | |
540 | |
541 Here we see that the simplifycfg pass decided to clone the return | |
542 instruction into the end of the 'else' block. This allowed it to | |
543 eliminate some branches and the PHI node. | |
544 | |
545 Now that all symbol table references are updated to use stack variables, | |
546 we'll add the assignment operator. | |
547 | |
548 New Assignment Operator | |
549 ======================= | |
550 | |
551 With our current framework, adding a new assignment operator is really | |
552 simple. We will parse it just like any other binary operator, but handle | |
553 it internally (instead of allowing the user to define it). The first | |
554 step is to set a precedence: | |
555 | |
556 .. code-block:: c++ | |
557 | |
558 int main() { | |
559 // Install standard binary operators. | |
560 // 1 is lowest precedence. | |
561 BinopPrecedence['='] = 2; | |
562 BinopPrecedence['<'] = 10; | |
563 BinopPrecedence['+'] = 20; | |
564 BinopPrecedence['-'] = 20; | |
565 | |
566 Now that the parser knows the precedence of the binary operator, it | |
567 takes care of all the parsing and AST generation. We just need to | |
568 implement codegen for the assignment operator. This looks like: | |
569 | |
570 .. code-block:: c++ | |
571 | |
572 Value *BinaryExprAST::Codegen() { | |
573 // Special case '=' because we don't want to emit the LHS as an expression. | |
574 if (Op == '=') { | |
575 // Assignment requires the LHS to be an identifier. | |
576 VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS); | |
577 if (!LHSE) | |
578 return ErrorV("destination of '=' must be a variable"); | |
579 | |
580 Unlike the rest of the binary operators, our assignment operator doesn't | |
581 follow the "emit LHS, emit RHS, do computation" model. As such, it is | |
582 handled as a special case before the other binary operators are handled. | |
583 The other strange thing is that it requires the LHS to be a variable. It | |
584 is invalid to have "(x+1) = expr" - only things like "x = expr" are | |
585 allowed. | |
586 | |
587 .. code-block:: c++ | |
588 | |
589 // Codegen the RHS. | |
590 Value *Val = RHS->Codegen(); | |
591 if (Val == 0) return 0; | |
592 | |
593 // Look up the name. | |
594 Value *Variable = NamedValues[LHSE->getName()]; | |
595 if (Variable == 0) return ErrorV("Unknown variable name"); | |
596 | |
597 Builder.CreateStore(Val, Variable); | |
598 return Val; | |
599 } | |
600 ... | |
601 | |
602 Once we have the variable, codegen'ing the assignment is | |
603 straightforward: we emit the RHS of the assignment, create a store, and | |
604 return the computed value. Returning a value allows for chained | |
605 assignments like "X = (Y = Z)". | |
606 | |
607 Now that we have an assignment operator, we can mutate loop variables | |
608 and arguments. For example, we can now run code like this: | |
609 | |
610 :: | |
611 | |
612 # Function to print a double. | |
613 extern printd(x); | |
614 | |
615 # Define ':' for sequencing: as a low-precedence operator that ignores operands | |
616 # and just returns the RHS. | |
617 def binary : 1 (x y) y; | |
618 | |
619 def test(x) | |
620 printd(x) : | |
621 x = 4 : | |
622 printd(x); | |
623 | |
624 test(123); | |
625 | |
626 When run, this example prints "123" and then "4", showing that we did | |
627 actually mutate the value! Okay, we have now officially implemented our | |
628 goal: getting this to work requires SSA construction in the general | |
629 case. However, to be really useful, we want the ability to define our | |
630 own local variables, lets add this next! | |
631 | |
632 User-defined Local Variables | |
633 ============================ | |
634 | |
635 Adding var/in is just like any other other extensions we made to | |
636 Kaleidoscope: we extend the lexer, the parser, the AST and the code | |
637 generator. The first step for adding our new 'var/in' construct is to | |
638 extend the lexer. As before, this is pretty trivial, the code looks like | |
639 this: | |
640 | |
641 .. code-block:: c++ | |
642 | |
643 enum Token { | |
644 ... | |
645 // var definition | |
646 tok_var = -13 | |
647 ... | |
648 } | |
649 ... | |
650 static int gettok() { | |
651 ... | |
652 if (IdentifierStr == "in") return tok_in; | |
653 if (IdentifierStr == "binary") return tok_binary; | |
654 if (IdentifierStr == "unary") return tok_unary; | |
655 if (IdentifierStr == "var") return tok_var; | |
656 return tok_identifier; | |
657 ... | |
658 | |
659 The next step is to define the AST node that we will construct. For | |
660 var/in, it looks like this: | |
661 | |
662 .. code-block:: c++ | |
663 | |
664 /// VarExprAST - Expression class for var/in | |
665 class VarExprAST : public ExprAST { | |
666 std::vector<std::pair<std::string, ExprAST*> > VarNames; | |
667 ExprAST *Body; | |
668 public: | |
669 VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames, | |
670 ExprAST *body) | |
671 : VarNames(varnames), Body(body) {} | |
672 | |
673 virtual Value *Codegen(); | |
674 }; | |
675 | |
676 var/in allows a list of names to be defined all at once, and each name | |
677 can optionally have an initializer value. As such, we capture this | |
678 information in the VarNames vector. Also, var/in has a body, this body | |
679 is allowed to access the variables defined by the var/in. | |
680 | |
681 With this in place, we can define the parser pieces. The first thing we | |
682 do is add it as a primary expression: | |
683 | |
684 .. code-block:: c++ | |
685 | |
686 /// primary | |
687 /// ::= identifierexpr | |
688 /// ::= numberexpr | |
689 /// ::= parenexpr | |
690 /// ::= ifexpr | |
691 /// ::= forexpr | |
692 /// ::= varexpr | |
693 static ExprAST *ParsePrimary() { | |
694 switch (CurTok) { | |
695 default: return Error("unknown token when expecting an expression"); | |
696 case tok_identifier: return ParseIdentifierExpr(); | |
697 case tok_number: return ParseNumberExpr(); | |
698 case '(': return ParseParenExpr(); | |
699 case tok_if: return ParseIfExpr(); | |
700 case tok_for: return ParseForExpr(); | |
701 case tok_var: return ParseVarExpr(); | |
702 } | |
703 } | |
704 | |
705 Next we define ParseVarExpr: | |
706 | |
707 .. code-block:: c++ | |
708 | |
709 /// varexpr ::= 'var' identifier ('=' expression)? | |
710 // (',' identifier ('=' expression)?)* 'in' expression | |
711 static ExprAST *ParseVarExpr() { | |
712 getNextToken(); // eat the var. | |
713 | |
714 std::vector<std::pair<std::string, ExprAST*> > VarNames; | |
715 | |
716 // At least one variable name is required. | |
717 if (CurTok != tok_identifier) | |
718 return Error("expected identifier after var"); | |
719 | |
720 The first part of this code parses the list of identifier/expr pairs | |
721 into the local ``VarNames`` vector. | |
722 | |
723 .. code-block:: c++ | |
724 | |
725 while (1) { | |
726 std::string Name = IdentifierStr; | |
727 getNextToken(); // eat identifier. | |
728 | |
729 // Read the optional initializer. | |
730 ExprAST *Init = 0; | |
731 if (CurTok == '=') { | |
732 getNextToken(); // eat the '='. | |
733 | |
734 Init = ParseExpression(); | |
735 if (Init == 0) return 0; | |
736 } | |
737 | |
738 VarNames.push_back(std::make_pair(Name, Init)); | |
739 | |
740 // End of var list, exit loop. | |
741 if (CurTok != ',') break; | |
742 getNextToken(); // eat the ','. | |
743 | |
744 if (CurTok != tok_identifier) | |
745 return Error("expected identifier list after var"); | |
746 } | |
747 | |
748 Once all the variables are parsed, we then parse the body and create the | |
749 AST node: | |
750 | |
751 .. code-block:: c++ | |
752 | |
753 // At this point, we have to have 'in'. | |
754 if (CurTok != tok_in) | |
755 return Error("expected 'in' keyword after 'var'"); | |
756 getNextToken(); // eat 'in'. | |
757 | |
758 ExprAST *Body = ParseExpression(); | |
759 if (Body == 0) return 0; | |
760 | |
761 return new VarExprAST(VarNames, Body); | |
762 } | |
763 | |
764 Now that we can parse and represent the code, we need to support | |
765 emission of LLVM IR for it. This code starts out with: | |
766 | |
767 .. code-block:: c++ | |
768 | |
769 Value *VarExprAST::Codegen() { | |
770 std::vector<AllocaInst *> OldBindings; | |
771 | |
772 Function *TheFunction = Builder.GetInsertBlock()->getParent(); | |
773 | |
774 // Register all variables and emit their initializer. | |
775 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { | |
776 const std::string &VarName = VarNames[i].first; | |
777 ExprAST *Init = VarNames[i].second; | |
778 | |
779 Basically it loops over all the variables, installing them one at a | |
780 time. For each variable we put into the symbol table, we remember the | |
781 previous value that we replace in OldBindings. | |
782 | |
783 .. code-block:: c++ | |
784 | |
785 // Emit the initializer before adding the variable to scope, this prevents | |
786 // the initializer from referencing the variable itself, and permits stuff | |
787 // like this: | |
788 // var a = 1 in | |
789 // var a = a in ... # refers to outer 'a'. | |
790 Value *InitVal; | |
791 if (Init) { | |
792 InitVal = Init->Codegen(); | |
793 if (InitVal == 0) return 0; | |
794 } else { // If not specified, use 0.0. | |
795 InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0)); | |
796 } | |
797 | |
798 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); | |
799 Builder.CreateStore(InitVal, Alloca); | |
800 | |
801 // Remember the old variable binding so that we can restore the binding when | |
802 // we unrecurse. | |
803 OldBindings.push_back(NamedValues[VarName]); | |
804 | |
805 // Remember this binding. | |
806 NamedValues[VarName] = Alloca; | |
807 } | |
808 | |
809 There are more comments here than code. The basic idea is that we emit | |
810 the initializer, create the alloca, then update the symbol table to | |
811 point to it. Once all the variables are installed in the symbol table, | |
812 we evaluate the body of the var/in expression: | |
813 | |
814 .. code-block:: c++ | |
815 | |
816 // Codegen the body, now that all vars are in scope. | |
817 Value *BodyVal = Body->Codegen(); | |
818 if (BodyVal == 0) return 0; | |
819 | |
820 Finally, before returning, we restore the previous variable bindings: | |
821 | |
822 .. code-block:: c++ | |
823 | |
824 // Pop all our variables from scope. | |
825 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) | |
826 NamedValues[VarNames[i].first] = OldBindings[i]; | |
827 | |
828 // Return the body computation. | |
829 return BodyVal; | |
830 } | |
831 | |
832 The end result of all of this is that we get properly scoped variable | |
833 definitions, and we even (trivially) allow mutation of them :). | |
834 | |
835 With this, we completed what we set out to do. Our nice iterative fib | |
836 example from the intro compiles and runs just fine. The mem2reg pass | |
837 optimizes all of our stack variables into SSA registers, inserting PHI | |
838 nodes where needed, and our front-end remains simple: no "iterated | |
839 dominance frontier" computation anywhere in sight. | |
840 | |
841 Full Code Listing | |
842 ================= | |
843 | |
844 Here is the complete code listing for our running example, enhanced with | |
845 mutable variables and var/in support. To build this example, use: | |
846 | |
847 .. code-block:: bash | |
848 | |
849 # Compile | |
850 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy | |
851 # Run | |
852 ./toy | |
853 | |
854 Here is the code: | |
855 | |
856 .. code-block:: c++ | |
857 | |
858 #include "llvm/DerivedTypes.h" | |
859 #include "llvm/ExecutionEngine/ExecutionEngine.h" | |
860 #include "llvm/ExecutionEngine/JIT.h" | |
861 #include "llvm/IRBuilder.h" | |
862 #include "llvm/LLVMContext.h" | |
863 #include "llvm/Module.h" | |
864 #include "llvm/PassManager.h" | |
865 #include "llvm/Analysis/Verifier.h" | |
866 #include "llvm/Analysis/Passes.h" | |
867 #include "llvm/DataLayout.h" | |
868 #include "llvm/Transforms/Scalar.h" | |
869 #include "llvm/Support/TargetSelect.h" | |
870 #include <cstdio> | |
871 #include <string> | |
872 #include <map> | |
873 #include <vector> | |
874 using namespace llvm; | |
875 | |
876 //===----------------------------------------------------------------------===// | |
877 // Lexer | |
878 //===----------------------------------------------------------------------===// | |
879 | |
880 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one | |
881 // of these for known things. | |
882 enum Token { | |
883 tok_eof = -1, | |
884 | |
885 // commands | |
886 tok_def = -2, tok_extern = -3, | |
887 | |
888 // primary | |
889 tok_identifier = -4, tok_number = -5, | |
890 | |
891 // control | |
892 tok_if = -6, tok_then = -7, tok_else = -8, | |
893 tok_for = -9, tok_in = -10, | |
894 | |
895 // operators | |
896 tok_binary = -11, tok_unary = -12, | |
897 | |
898 // var definition | |
899 tok_var = -13 | |
900 }; | |
901 | |
902 static std::string IdentifierStr; // Filled in if tok_identifier | |
903 static double NumVal; // Filled in if tok_number | |
904 | |
905 /// gettok - Return the next token from standard input. | |
906 static int gettok() { | |
907 static int LastChar = ' '; | |
908 | |
909 // Skip any whitespace. | |
910 while (isspace(LastChar)) | |
911 LastChar = getchar(); | |
912 | |
913 if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* | |
914 IdentifierStr = LastChar; | |
915 while (isalnum((LastChar = getchar()))) | |
916 IdentifierStr += LastChar; | |
917 | |
918 if (IdentifierStr == "def") return tok_def; | |
919 if (IdentifierStr == "extern") return tok_extern; | |
920 if (IdentifierStr == "if") return tok_if; | |
921 if (IdentifierStr == "then") return tok_then; | |
922 if (IdentifierStr == "else") return tok_else; | |
923 if (IdentifierStr == "for") return tok_for; | |
924 if (IdentifierStr == "in") return tok_in; | |
925 if (IdentifierStr == "binary") return tok_binary; | |
926 if (IdentifierStr == "unary") return tok_unary; | |
927 if (IdentifierStr == "var") return tok_var; | |
928 return tok_identifier; | |
929 } | |
930 | |
931 if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ | |
932 std::string NumStr; | |
933 do { | |
934 NumStr += LastChar; | |
935 LastChar = getchar(); | |
936 } while (isdigit(LastChar) || LastChar == '.'); | |
937 | |
938 NumVal = strtod(NumStr.c_str(), 0); | |
939 return tok_number; | |
940 } | |
941 | |
942 if (LastChar == '#') { | |
943 // Comment until end of line. | |
944 do LastChar = getchar(); | |
945 while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); | |
946 | |
947 if (LastChar != EOF) | |
948 return gettok(); | |
949 } | |
950 | |
951 // Check for end of file. Don't eat the EOF. | |
952 if (LastChar == EOF) | |
953 return tok_eof; | |
954 | |
955 // Otherwise, just return the character as its ascii value. | |
956 int ThisChar = LastChar; | |
957 LastChar = getchar(); | |
958 return ThisChar; | |
959 } | |
960 | |
961 //===----------------------------------------------------------------------===// | |
962 // Abstract Syntax Tree (aka Parse Tree) | |
963 //===----------------------------------------------------------------------===// | |
964 | |
965 /// ExprAST - Base class for all expression nodes. | |
966 class ExprAST { | |
967 public: | |
968 virtual ~ExprAST() {} | |
969 virtual Value *Codegen() = 0; | |
970 }; | |
971 | |
972 /// NumberExprAST - Expression class for numeric literals like "1.0". | |
973 class NumberExprAST : public ExprAST { | |
974 double Val; | |
975 public: | |
976 NumberExprAST(double val) : Val(val) {} | |
977 virtual Value *Codegen(); | |
978 }; | |
979 | |
980 /// VariableExprAST - Expression class for referencing a variable, like "a". | |
981 class VariableExprAST : public ExprAST { | |
982 std::string Name; | |
983 public: | |
984 VariableExprAST(const std::string &name) : Name(name) {} | |
985 const std::string &getName() const { return Name; } | |
986 virtual Value *Codegen(); | |
987 }; | |
988 | |
989 /// UnaryExprAST - Expression class for a unary operator. | |
990 class UnaryExprAST : public ExprAST { | |
991 char Opcode; | |
992 ExprAST *Operand; | |
993 public: | |
994 UnaryExprAST(char opcode, ExprAST *operand) | |
995 : Opcode(opcode), Operand(operand) {} | |
996 virtual Value *Codegen(); | |
997 }; | |
998 | |
999 /// BinaryExprAST - Expression class for a binary operator. | |
1000 class BinaryExprAST : public ExprAST { | |
1001 char Op; | |
1002 ExprAST *LHS, *RHS; | |
1003 public: | |
1004 BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs) | |
1005 : Op(op), LHS(lhs), RHS(rhs) {} | |
1006 virtual Value *Codegen(); | |
1007 }; | |
1008 | |
1009 /// CallExprAST - Expression class for function calls. | |
1010 class CallExprAST : public ExprAST { | |
1011 std::string Callee; | |
1012 std::vector<ExprAST*> Args; | |
1013 public: | |
1014 CallExprAST(const std::string &callee, std::vector<ExprAST*> &args) | |
1015 : Callee(callee), Args(args) {} | |
1016 virtual Value *Codegen(); | |
1017 }; | |
1018 | |
1019 /// IfExprAST - Expression class for if/then/else. | |
1020 class IfExprAST : public ExprAST { | |
1021 ExprAST *Cond, *Then, *Else; | |
1022 public: | |
1023 IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else) | |
1024 : Cond(cond), Then(then), Else(_else) {} | |
1025 virtual Value *Codegen(); | |
1026 }; | |
1027 | |
1028 /// ForExprAST - Expression class for for/in. | |
1029 class ForExprAST : public ExprAST { | |
1030 std::string VarName; | |
1031 ExprAST *Start, *End, *Step, *Body; | |
1032 public: | |
1033 ForExprAST(const std::string &varname, ExprAST *start, ExprAST *end, | |
1034 ExprAST *step, ExprAST *body) | |
1035 : VarName(varname), Start(start), End(end), Step(step), Body(body) {} | |
1036 virtual Value *Codegen(); | |
1037 }; | |
1038 | |
1039 /// VarExprAST - Expression class for var/in | |
1040 class VarExprAST : public ExprAST { | |
1041 std::vector<std::pair<std::string, ExprAST*> > VarNames; | |
1042 ExprAST *Body; | |
1043 public: | |
1044 VarExprAST(const std::vector<std::pair<std::string, ExprAST*> > &varnames, | |
1045 ExprAST *body) | |
1046 : VarNames(varnames), Body(body) {} | |
1047 | |
1048 virtual Value *Codegen(); | |
1049 }; | |
1050 | |
1051 /// PrototypeAST - This class represents the "prototype" for a function, | |
1052 /// which captures its name, and its argument names (thus implicitly the number | |
1053 /// of arguments the function takes), as well as if it is an operator. | |
1054 class PrototypeAST { | |
1055 std::string Name; | |
1056 std::vector<std::string> Args; | |
1057 bool isOperator; | |
1058 unsigned Precedence; // Precedence if a binary op. | |
1059 public: | |
1060 PrototypeAST(const std::string &name, const std::vector<std::string> &args, | |
1061 bool isoperator = false, unsigned prec = 0) | |
1062 : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {} | |
1063 | |
1064 bool isUnaryOp() const { return isOperator && Args.size() == 1; } | |
1065 bool isBinaryOp() const { return isOperator && Args.size() == 2; } | |
1066 | |
1067 char getOperatorName() const { | |
1068 assert(isUnaryOp() || isBinaryOp()); | |
1069 return Name[Name.size()-1]; | |
1070 } | |
1071 | |
1072 unsigned getBinaryPrecedence() const { return Precedence; } | |
1073 | |
1074 Function *Codegen(); | |
1075 | |
1076 void CreateArgumentAllocas(Function *F); | |
1077 }; | |
1078 | |
1079 /// FunctionAST - This class represents a function definition itself. | |
1080 class FunctionAST { | |
1081 PrototypeAST *Proto; | |
1082 ExprAST *Body; | |
1083 public: | |
1084 FunctionAST(PrototypeAST *proto, ExprAST *body) | |
1085 : Proto(proto), Body(body) {} | |
1086 | |
1087 Function *Codegen(); | |
1088 }; | |
1089 | |
1090 //===----------------------------------------------------------------------===// | |
1091 // Parser | |
1092 //===----------------------------------------------------------------------===// | |
1093 | |
1094 /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current | |
1095 /// token the parser is looking at. getNextToken reads another token from the | |
1096 /// lexer and updates CurTok with its results. | |
1097 static int CurTok; | |
1098 static int getNextToken() { | |
1099 return CurTok = gettok(); | |
1100 } | |
1101 | |
1102 /// BinopPrecedence - This holds the precedence for each binary operator that is | |
1103 /// defined. | |
1104 static std::map<char, int> BinopPrecedence; | |
1105 | |
1106 /// GetTokPrecedence - Get the precedence of the pending binary operator token. | |
1107 static int GetTokPrecedence() { | |
1108 if (!isascii(CurTok)) | |
1109 return -1; | |
1110 | |
1111 // Make sure it's a declared binop. | |
1112 int TokPrec = BinopPrecedence[CurTok]; | |
1113 if (TokPrec <= 0) return -1; | |
1114 return TokPrec; | |
1115 } | |
1116 | |
1117 /// Error* - These are little helper functions for error handling. | |
1118 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;} | |
1119 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; } | |
1120 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; } | |
1121 | |
1122 static ExprAST *ParseExpression(); | |
1123 | |
1124 /// identifierexpr | |
1125 /// ::= identifier | |
1126 /// ::= identifier '(' expression* ')' | |
1127 static ExprAST *ParseIdentifierExpr() { | |
1128 std::string IdName = IdentifierStr; | |
1129 | |
1130 getNextToken(); // eat identifier. | |
1131 | |
1132 if (CurTok != '(') // Simple variable ref. | |
1133 return new VariableExprAST(IdName); | |
1134 | |
1135 // Call. | |
1136 getNextToken(); // eat ( | |
1137 std::vector<ExprAST*> Args; | |
1138 if (CurTok != ')') { | |
1139 while (1) { | |
1140 ExprAST *Arg = ParseExpression(); | |
1141 if (!Arg) return 0; | |
1142 Args.push_back(Arg); | |
1143 | |
1144 if (CurTok == ')') break; | |
1145 | |
1146 if (CurTok != ',') | |
1147 return Error("Expected ')' or ',' in argument list"); | |
1148 getNextToken(); | |
1149 } | |
1150 } | |
1151 | |
1152 // Eat the ')'. | |
1153 getNextToken(); | |
1154 | |
1155 return new CallExprAST(IdName, Args); | |
1156 } | |
1157 | |
1158 /// numberexpr ::= number | |
1159 static ExprAST *ParseNumberExpr() { | |
1160 ExprAST *Result = new NumberExprAST(NumVal); | |
1161 getNextToken(); // consume the number | |
1162 return Result; | |
1163 } | |
1164 | |
1165 /// parenexpr ::= '(' expression ')' | |
1166 static ExprAST *ParseParenExpr() { | |
1167 getNextToken(); // eat (. | |
1168 ExprAST *V = ParseExpression(); | |
1169 if (!V) return 0; | |
1170 | |
1171 if (CurTok != ')') | |
1172 return Error("expected ')'"); | |
1173 getNextToken(); // eat ). | |
1174 return V; | |
1175 } | |
1176 | |
1177 /// ifexpr ::= 'if' expression 'then' expression 'else' expression | |
1178 static ExprAST *ParseIfExpr() { | |
1179 getNextToken(); // eat the if. | |
1180 | |
1181 // condition. | |
1182 ExprAST *Cond = ParseExpression(); | |
1183 if (!Cond) return 0; | |
1184 | |
1185 if (CurTok != tok_then) | |
1186 return Error("expected then"); | |
1187 getNextToken(); // eat the then | |
1188 | |
1189 ExprAST *Then = ParseExpression(); | |
1190 if (Then == 0) return 0; | |
1191 | |
1192 if (CurTok != tok_else) | |
1193 return Error("expected else"); | |
1194 | |
1195 getNextToken(); | |
1196 | |
1197 ExprAST *Else = ParseExpression(); | |
1198 if (!Else) return 0; | |
1199 | |
1200 return new IfExprAST(Cond, Then, Else); | |
1201 } | |
1202 | |
1203 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression | |
1204 static ExprAST *ParseForExpr() { | |
1205 getNextToken(); // eat the for. | |
1206 | |
1207 if (CurTok != tok_identifier) | |
1208 return Error("expected identifier after for"); | |
1209 | |
1210 std::string IdName = IdentifierStr; | |
1211 getNextToken(); // eat identifier. | |
1212 | |
1213 if (CurTok != '=') | |
1214 return Error("expected '=' after for"); | |
1215 getNextToken(); // eat '='. | |
1216 | |
1217 | |
1218 ExprAST *Start = ParseExpression(); | |
1219 if (Start == 0) return 0; | |
1220 if (CurTok != ',') | |
1221 return Error("expected ',' after for start value"); | |
1222 getNextToken(); | |
1223 | |
1224 ExprAST *End = ParseExpression(); | |
1225 if (End == 0) return 0; | |
1226 | |
1227 // The step value is optional. | |
1228 ExprAST *Step = 0; | |
1229 if (CurTok == ',') { | |
1230 getNextToken(); | |
1231 Step = ParseExpression(); | |
1232 if (Step == 0) return 0; | |
1233 } | |
1234 | |
1235 if (CurTok != tok_in) | |
1236 return Error("expected 'in' after for"); | |
1237 getNextToken(); // eat 'in'. | |
1238 | |
1239 ExprAST *Body = ParseExpression(); | |
1240 if (Body == 0) return 0; | |
1241 | |
1242 return new ForExprAST(IdName, Start, End, Step, Body); | |
1243 } | |
1244 | |
1245 /// varexpr ::= 'var' identifier ('=' expression)? | |
1246 // (',' identifier ('=' expression)?)* 'in' expression | |
1247 static ExprAST *ParseVarExpr() { | |
1248 getNextToken(); // eat the var. | |
1249 | |
1250 std::vector<std::pair<std::string, ExprAST*> > VarNames; | |
1251 | |
1252 // At least one variable name is required. | |
1253 if (CurTok != tok_identifier) | |
1254 return Error("expected identifier after var"); | |
1255 | |
1256 while (1) { | |
1257 std::string Name = IdentifierStr; | |
1258 getNextToken(); // eat identifier. | |
1259 | |
1260 // Read the optional initializer. | |
1261 ExprAST *Init = 0; | |
1262 if (CurTok == '=') { | |
1263 getNextToken(); // eat the '='. | |
1264 | |
1265 Init = ParseExpression(); | |
1266 if (Init == 0) return 0; | |
1267 } | |
1268 | |
1269 VarNames.push_back(std::make_pair(Name, Init)); | |
1270 | |
1271 // End of var list, exit loop. | |
1272 if (CurTok != ',') break; | |
1273 getNextToken(); // eat the ','. | |
1274 | |
1275 if (CurTok != tok_identifier) | |
1276 return Error("expected identifier list after var"); | |
1277 } | |
1278 | |
1279 // At this point, we have to have 'in'. | |
1280 if (CurTok != tok_in) | |
1281 return Error("expected 'in' keyword after 'var'"); | |
1282 getNextToken(); // eat 'in'. | |
1283 | |
1284 ExprAST *Body = ParseExpression(); | |
1285 if (Body == 0) return 0; | |
1286 | |
1287 return new VarExprAST(VarNames, Body); | |
1288 } | |
1289 | |
1290 /// primary | |
1291 /// ::= identifierexpr | |
1292 /// ::= numberexpr | |
1293 /// ::= parenexpr | |
1294 /// ::= ifexpr | |
1295 /// ::= forexpr | |
1296 /// ::= varexpr | |
1297 static ExprAST *ParsePrimary() { | |
1298 switch (CurTok) { | |
1299 default: return Error("unknown token when expecting an expression"); | |
1300 case tok_identifier: return ParseIdentifierExpr(); | |
1301 case tok_number: return ParseNumberExpr(); | |
1302 case '(': return ParseParenExpr(); | |
1303 case tok_if: return ParseIfExpr(); | |
1304 case tok_for: return ParseForExpr(); | |
1305 case tok_var: return ParseVarExpr(); | |
1306 } | |
1307 } | |
1308 | |
1309 /// unary | |
1310 /// ::= primary | |
1311 /// ::= '!' unary | |
1312 static ExprAST *ParseUnary() { | |
1313 // If the current token is not an operator, it must be a primary expr. | |
1314 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') | |
1315 return ParsePrimary(); | |
1316 | |
1317 // If this is a unary operator, read it. | |
1318 int Opc = CurTok; | |
1319 getNextToken(); | |
1320 if (ExprAST *Operand = ParseUnary()) | |
1321 return new UnaryExprAST(Opc, Operand); | |
1322 return 0; | |
1323 } | |
1324 | |
1325 /// binoprhs | |
1326 /// ::= ('+' unary)* | |
1327 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) { | |
1328 // If this is a binop, find its precedence. | |
1329 while (1) { | |
1330 int TokPrec = GetTokPrecedence(); | |
1331 | |
1332 // If this is a binop that binds at least as tightly as the current binop, | |
1333 // consume it, otherwise we are done. | |
1334 if (TokPrec < ExprPrec) | |
1335 return LHS; | |
1336 | |
1337 // Okay, we know this is a binop. | |
1338 int BinOp = CurTok; | |
1339 getNextToken(); // eat binop | |
1340 | |
1341 // Parse the unary expression after the binary operator. | |
1342 ExprAST *RHS = ParseUnary(); | |
1343 if (!RHS) return 0; | |
1344 | |
1345 // If BinOp binds less tightly with RHS than the operator after RHS, let | |
1346 // the pending operator take RHS as its LHS. | |
1347 int NextPrec = GetTokPrecedence(); | |
1348 if (TokPrec < NextPrec) { | |
1349 RHS = ParseBinOpRHS(TokPrec+1, RHS); | |
1350 if (RHS == 0) return 0; | |
1351 } | |
1352 | |
1353 // Merge LHS/RHS. | |
1354 LHS = new BinaryExprAST(BinOp, LHS, RHS); | |
1355 } | |
1356 } | |
1357 | |
1358 /// expression | |
1359 /// ::= unary binoprhs | |
1360 /// | |
1361 static ExprAST *ParseExpression() { | |
1362 ExprAST *LHS = ParseUnary(); | |
1363 if (!LHS) return 0; | |
1364 | |
1365 return ParseBinOpRHS(0, LHS); | |
1366 } | |
1367 | |
1368 /// prototype | |
1369 /// ::= id '(' id* ')' | |
1370 /// ::= binary LETTER number? (id, id) | |
1371 /// ::= unary LETTER (id) | |
1372 static PrototypeAST *ParsePrototype() { | |
1373 std::string FnName; | |
1374 | |
1375 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. | |
1376 unsigned BinaryPrecedence = 30; | |
1377 | |
1378 switch (CurTok) { | |
1379 default: | |
1380 return ErrorP("Expected function name in prototype"); | |
1381 case tok_identifier: | |
1382 FnName = IdentifierStr; | |
1383 Kind = 0; | |
1384 getNextToken(); | |
1385 break; | |
1386 case tok_unary: | |
1387 getNextToken(); | |
1388 if (!isascii(CurTok)) | |
1389 return ErrorP("Expected unary operator"); | |
1390 FnName = "unary"; | |
1391 FnName += (char)CurTok; | |
1392 Kind = 1; | |
1393 getNextToken(); | |
1394 break; | |
1395 case tok_binary: | |
1396 getNextToken(); | |
1397 if (!isascii(CurTok)) | |
1398 return ErrorP("Expected binary operator"); | |
1399 FnName = "binary"; | |
1400 FnName += (char)CurTok; | |
1401 Kind = 2; | |
1402 getNextToken(); | |
1403 | |
1404 // Read the precedence if present. | |
1405 if (CurTok == tok_number) { | |
1406 if (NumVal < 1 || NumVal > 100) | |
1407 return ErrorP("Invalid precedecnce: must be 1..100"); | |
1408 BinaryPrecedence = (unsigned)NumVal; | |
1409 getNextToken(); | |
1410 } | |
1411 break; | |
1412 } | |
1413 | |
1414 if (CurTok != '(') | |
1415 return ErrorP("Expected '(' in prototype"); | |
1416 | |
1417 std::vector<std::string> ArgNames; | |
1418 while (getNextToken() == tok_identifier) | |
1419 ArgNames.push_back(IdentifierStr); | |
1420 if (CurTok != ')') | |
1421 return ErrorP("Expected ')' in prototype"); | |
1422 | |
1423 // success. | |
1424 getNextToken(); // eat ')'. | |
1425 | |
1426 // Verify right number of names for operator. | |
1427 if (Kind && ArgNames.size() != Kind) | |
1428 return ErrorP("Invalid number of operands for operator"); | |
1429 | |
1430 return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence); | |
1431 } | |
1432 | |
1433 /// definition ::= 'def' prototype expression | |
1434 static FunctionAST *ParseDefinition() { | |
1435 getNextToken(); // eat def. | |
1436 PrototypeAST *Proto = ParsePrototype(); | |
1437 if (Proto == 0) return 0; | |
1438 | |
1439 if (ExprAST *E = ParseExpression()) | |
1440 return new FunctionAST(Proto, E); | |
1441 return 0; | |
1442 } | |
1443 | |
1444 /// toplevelexpr ::= expression | |
1445 static FunctionAST *ParseTopLevelExpr() { | |
1446 if (ExprAST *E = ParseExpression()) { | |
1447 // Make an anonymous proto. | |
1448 PrototypeAST *Proto = new PrototypeAST("", std::vector<std::string>()); | |
1449 return new FunctionAST(Proto, E); | |
1450 } | |
1451 return 0; | |
1452 } | |
1453 | |
1454 /// external ::= 'extern' prototype | |
1455 static PrototypeAST *ParseExtern() { | |
1456 getNextToken(); // eat extern. | |
1457 return ParsePrototype(); | |
1458 } | |
1459 | |
1460 //===----------------------------------------------------------------------===// | |
1461 // Code Generation | |
1462 //===----------------------------------------------------------------------===// | |
1463 | |
1464 static Module *TheModule; | |
1465 static IRBuilder<> Builder(getGlobalContext()); | |
1466 static std::map<std::string, AllocaInst*> NamedValues; | |
1467 static FunctionPassManager *TheFPM; | |
1468 | |
1469 Value *ErrorV(const char *Str) { Error(Str); return 0; } | |
1470 | |
1471 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of | |
1472 /// the function. This is used for mutable variables etc. | |
1473 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, | |
1474 const std::string &VarName) { | |
1475 IRBuilder<> TmpB(&TheFunction->getEntryBlock(), | |
1476 TheFunction->getEntryBlock().begin()); | |
1477 return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, | |
1478 VarName.c_str()); | |
1479 } | |
1480 | |
1481 Value *NumberExprAST::Codegen() { | |
1482 return ConstantFP::get(getGlobalContext(), APFloat(Val)); | |
1483 } | |
1484 | |
1485 Value *VariableExprAST::Codegen() { | |
1486 // Look this variable up in the function. | |
1487 Value *V = NamedValues[Name]; | |
1488 if (V == 0) return ErrorV("Unknown variable name"); | |
1489 | |
1490 // Load the value. | |
1491 return Builder.CreateLoad(V, Name.c_str()); | |
1492 } | |
1493 | |
1494 Value *UnaryExprAST::Codegen() { | |
1495 Value *OperandV = Operand->Codegen(); | |
1496 if (OperandV == 0) return 0; | |
1497 | |
1498 Function *F = TheModule->getFunction(std::string("unary")+Opcode); | |
1499 if (F == 0) | |
1500 return ErrorV("Unknown unary operator"); | |
1501 | |
1502 return Builder.CreateCall(F, OperandV, "unop"); | |
1503 } | |
1504 | |
1505 Value *BinaryExprAST::Codegen() { | |
1506 // Special case '=' because we don't want to emit the LHS as an expression. | |
1507 if (Op == '=') { | |
1508 // Assignment requires the LHS to be an identifier. | |
1509 VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS); | |
1510 if (!LHSE) | |
1511 return ErrorV("destination of '=' must be a variable"); | |
1512 // Codegen the RHS. | |
1513 Value *Val = RHS->Codegen(); | |
1514 if (Val == 0) return 0; | |
1515 | |
1516 // Look up the name. | |
1517 Value *Variable = NamedValues[LHSE->getName()]; | |
1518 if (Variable == 0) return ErrorV("Unknown variable name"); | |
1519 | |
1520 Builder.CreateStore(Val, Variable); | |
1521 return Val; | |
1522 } | |
1523 | |
1524 Value *L = LHS->Codegen(); | |
1525 Value *R = RHS->Codegen(); | |
1526 if (L == 0 || R == 0) return 0; | |
1527 | |
1528 switch (Op) { | |
1529 case '+': return Builder.CreateFAdd(L, R, "addtmp"); | |
1530 case '-': return Builder.CreateFSub(L, R, "subtmp"); | |
1531 case '*': return Builder.CreateFMul(L, R, "multmp"); | |
1532 case '<': | |
1533 L = Builder.CreateFCmpULT(L, R, "cmptmp"); | |
1534 // Convert bool 0/1 to double 0.0 or 1.0 | |
1535 return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), | |
1536 "booltmp"); | |
1537 default: break; | |
1538 } | |
1539 | |
1540 // If it wasn't a builtin binary operator, it must be a user defined one. Emit | |
1541 // a call to it. | |
1542 Function *F = TheModule->getFunction(std::string("binary")+Op); | |
1543 assert(F && "binary operator not found!"); | |
1544 | |
1545 Value *Ops[2] = { L, R }; | |
1546 return Builder.CreateCall(F, Ops, "binop"); | |
1547 } | |
1548 | |
1549 Value *CallExprAST::Codegen() { | |
1550 // Look up the name in the global module table. | |
1551 Function *CalleeF = TheModule->getFunction(Callee); | |
1552 if (CalleeF == 0) | |
1553 return ErrorV("Unknown function referenced"); | |
1554 | |
1555 // If argument mismatch error. | |
1556 if (CalleeF->arg_size() != Args.size()) | |
1557 return ErrorV("Incorrect # arguments passed"); | |
1558 | |
1559 std::vector<Value*> ArgsV; | |
1560 for (unsigned i = 0, e = Args.size(); i != e; ++i) { | |
1561 ArgsV.push_back(Args[i]->Codegen()); | |
1562 if (ArgsV.back() == 0) return 0; | |
1563 } | |
1564 | |
1565 return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); | |
1566 } | |
1567 | |
1568 Value *IfExprAST::Codegen() { | |
1569 Value *CondV = Cond->Codegen(); | |
1570 if (CondV == 0) return 0; | |
1571 | |
1572 // Convert condition to a bool by comparing equal to 0.0. | |
1573 CondV = Builder.CreateFCmpONE(CondV, | |
1574 ConstantFP::get(getGlobalContext(), APFloat(0.0)), | |
1575 "ifcond"); | |
1576 | |
1577 Function *TheFunction = Builder.GetInsertBlock()->getParent(); | |
1578 | |
1579 // Create blocks for the then and else cases. Insert the 'then' block at the | |
1580 // end of the function. | |
1581 BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction); | |
1582 BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else"); | |
1583 BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont"); | |
1584 | |
1585 Builder.CreateCondBr(CondV, ThenBB, ElseBB); | |
1586 | |
1587 // Emit then value. | |
1588 Builder.SetInsertPoint(ThenBB); | |
1589 | |
1590 Value *ThenV = Then->Codegen(); | |
1591 if (ThenV == 0) return 0; | |
1592 | |
1593 Builder.CreateBr(MergeBB); | |
1594 // Codegen of 'Then' can change the current block, update ThenBB for the PHI. | |
1595 ThenBB = Builder.GetInsertBlock(); | |
1596 | |
1597 // Emit else block. | |
1598 TheFunction->getBasicBlockList().push_back(ElseBB); | |
1599 Builder.SetInsertPoint(ElseBB); | |
1600 | |
1601 Value *ElseV = Else->Codegen(); | |
1602 if (ElseV == 0) return 0; | |
1603 | |
1604 Builder.CreateBr(MergeBB); | |
1605 // Codegen of 'Else' can change the current block, update ElseBB for the PHI. | |
1606 ElseBB = Builder.GetInsertBlock(); | |
1607 | |
1608 // Emit merge block. | |
1609 TheFunction->getBasicBlockList().push_back(MergeBB); | |
1610 Builder.SetInsertPoint(MergeBB); | |
1611 PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), 2, | |
1612 "iftmp"); | |
1613 | |
1614 PN->addIncoming(ThenV, ThenBB); | |
1615 PN->addIncoming(ElseV, ElseBB); | |
1616 return PN; | |
1617 } | |
1618 | |
1619 Value *ForExprAST::Codegen() { | |
1620 // Output this as: | |
1621 // var = alloca double | |
1622 // ... | |
1623 // start = startexpr | |
1624 // store start -> var | |
1625 // goto loop | |
1626 // loop: | |
1627 // ... | |
1628 // bodyexpr | |
1629 // ... | |
1630 // loopend: | |
1631 // step = stepexpr | |
1632 // endcond = endexpr | |
1633 // | |
1634 // curvar = load var | |
1635 // nextvar = curvar + step | |
1636 // store nextvar -> var | |
1637 // br endcond, loop, endloop | |
1638 // outloop: | |
1639 | |
1640 Function *TheFunction = Builder.GetInsertBlock()->getParent(); | |
1641 | |
1642 // Create an alloca for the variable in the entry block. | |
1643 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); | |
1644 | |
1645 // Emit the start code first, without 'variable' in scope. | |
1646 Value *StartVal = Start->Codegen(); | |
1647 if (StartVal == 0) return 0; | |
1648 | |
1649 // Store the value into the alloca. | |
1650 Builder.CreateStore(StartVal, Alloca); | |
1651 | |
1652 // Make the new basic block for the loop header, inserting after current | |
1653 // block. | |
1654 BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction); | |
1655 | |
1656 // Insert an explicit fall through from the current block to the LoopBB. | |
1657 Builder.CreateBr(LoopBB); | |
1658 | |
1659 // Start insertion in LoopBB. | |
1660 Builder.SetInsertPoint(LoopBB); | |
1661 | |
1662 // Within the loop, the variable is defined equal to the PHI node. If it | |
1663 // shadows an existing variable, we have to restore it, so save it now. | |
1664 AllocaInst *OldVal = NamedValues[VarName]; | |
1665 NamedValues[VarName] = Alloca; | |
1666 | |
1667 // Emit the body of the loop. This, like any other expr, can change the | |
1668 // current BB. Note that we ignore the value computed by the body, but don't | |
1669 // allow an error. | |
1670 if (Body->Codegen() == 0) | |
1671 return 0; | |
1672 | |
1673 // Emit the step value. | |
1674 Value *StepVal; | |
1675 if (Step) { | |
1676 StepVal = Step->Codegen(); | |
1677 if (StepVal == 0) return 0; | |
1678 } else { | |
1679 // If not specified, use 1.0. | |
1680 StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0)); | |
1681 } | |
1682 | |
1683 // Compute the end condition. | |
1684 Value *EndCond = End->Codegen(); | |
1685 if (EndCond == 0) return EndCond; | |
1686 | |
1687 // Reload, increment, and restore the alloca. This handles the case where | |
1688 // the body of the loop mutates the variable. | |
1689 Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str()); | |
1690 Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); | |
1691 Builder.CreateStore(NextVar, Alloca); | |
1692 | |
1693 // Convert condition to a bool by comparing equal to 0.0. | |
1694 EndCond = Builder.CreateFCmpONE(EndCond, | |
1695 ConstantFP::get(getGlobalContext(), APFloat(0.0)), | |
1696 "loopcond"); | |
1697 | |
1698 // Create the "after loop" block and insert it. | |
1699 BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction); | |
1700 | |
1701 // Insert the conditional branch into the end of LoopEndBB. | |
1702 Builder.CreateCondBr(EndCond, LoopBB, AfterBB); | |
1703 | |
1704 // Any new code will be inserted in AfterBB. | |
1705 Builder.SetInsertPoint(AfterBB); | |
1706 | |
1707 // Restore the unshadowed variable. | |
1708 if (OldVal) | |
1709 NamedValues[VarName] = OldVal; | |
1710 else | |
1711 NamedValues.erase(VarName); | |
1712 | |
1713 | |
1714 // for expr always returns 0.0. | |
1715 return Constant::getNullValue(Type::getDoubleTy(getGlobalContext())); | |
1716 } | |
1717 | |
1718 Value *VarExprAST::Codegen() { | |
1719 std::vector<AllocaInst *> OldBindings; | |
1720 | |
1721 Function *TheFunction = Builder.GetInsertBlock()->getParent(); | |
1722 | |
1723 // Register all variables and emit their initializer. | |
1724 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { | |
1725 const std::string &VarName = VarNames[i].first; | |
1726 ExprAST *Init = VarNames[i].second; | |
1727 | |
1728 // Emit the initializer before adding the variable to scope, this prevents | |
1729 // the initializer from referencing the variable itself, and permits stuff | |
1730 // like this: | |
1731 // var a = 1 in | |
1732 // var a = a in ... # refers to outer 'a'. | |
1733 Value *InitVal; | |
1734 if (Init) { | |
1735 InitVal = Init->Codegen(); | |
1736 if (InitVal == 0) return 0; | |
1737 } else { // If not specified, use 0.0. | |
1738 InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0)); | |
1739 } | |
1740 | |
1741 AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); | |
1742 Builder.CreateStore(InitVal, Alloca); | |
1743 | |
1744 // Remember the old variable binding so that we can restore the binding when | |
1745 // we unrecurse. | |
1746 OldBindings.push_back(NamedValues[VarName]); | |
1747 | |
1748 // Remember this binding. | |
1749 NamedValues[VarName] = Alloca; | |
1750 } | |
1751 | |
1752 // Codegen the body, now that all vars are in scope. | |
1753 Value *BodyVal = Body->Codegen(); | |
1754 if (BodyVal == 0) return 0; | |
1755 | |
1756 // Pop all our variables from scope. | |
1757 for (unsigned i = 0, e = VarNames.size(); i != e; ++i) | |
1758 NamedValues[VarNames[i].first] = OldBindings[i]; | |
1759 | |
1760 // Return the body computation. | |
1761 return BodyVal; | |
1762 } | |
1763 | |
1764 Function *PrototypeAST::Codegen() { | |
1765 // Make the function type: double(double,double) etc. | |
1766 std::vector<Type*> Doubles(Args.size(), | |
1767 Type::getDoubleTy(getGlobalContext())); | |
1768 FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), | |
1769 Doubles, false); | |
1770 | |
1771 Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule); | |
1772 | |
1773 // If F conflicted, there was already something named 'Name'. If it has a | |
1774 // body, don't allow redefinition or reextern. | |
1775 if (F->getName() != Name) { | |
1776 // Delete the one we just made and get the existing one. | |
1777 F->eraseFromParent(); | |
1778 F = TheModule->getFunction(Name); | |
1779 | |
1780 // If F already has a body, reject this. | |
1781 if (!F->empty()) { | |
1782 ErrorF("redefinition of function"); | |
1783 return 0; | |
1784 } | |
1785 | |
1786 // If F took a different number of args, reject. | |
1787 if (F->arg_size() != Args.size()) { | |
1788 ErrorF("redefinition of function with different # args"); | |
1789 return 0; | |
1790 } | |
1791 } | |
1792 | |
1793 // Set names for all arguments. | |
1794 unsigned Idx = 0; | |
1795 for (Function::arg_iterator AI = F->arg_begin(); Idx != Args.size(); | |
1796 ++AI, ++Idx) | |
1797 AI->setName(Args[Idx]); | |
1798 | |
1799 return F; | |
1800 } | |
1801 | |
1802 /// CreateArgumentAllocas - Create an alloca for each argument and register the | |
1803 /// argument in the symbol table so that references to it will succeed. | |
1804 void PrototypeAST::CreateArgumentAllocas(Function *F) { | |
1805 Function::arg_iterator AI = F->arg_begin(); | |
1806 for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { | |
1807 // Create an alloca for this variable. | |
1808 AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); | |
1809 | |
1810 // Store the initial value into the alloca. | |
1811 Builder.CreateStore(AI, Alloca); | |
1812 | |
1813 // Add arguments to variable symbol table. | |
1814 NamedValues[Args[Idx]] = Alloca; | |
1815 } | |
1816 } | |
1817 | |
1818 Function *FunctionAST::Codegen() { | |
1819 NamedValues.clear(); | |
1820 | |
1821 Function *TheFunction = Proto->Codegen(); | |
1822 if (TheFunction == 0) | |
1823 return 0; | |
1824 | |
1825 // If this is an operator, install it. | |
1826 if (Proto->isBinaryOp()) | |
1827 BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); | |
1828 | |
1829 // Create a new basic block to start insertion into. | |
1830 BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction); | |
1831 Builder.SetInsertPoint(BB); | |
1832 | |
1833 // Add all arguments to the symbol table and create their allocas. | |
1834 Proto->CreateArgumentAllocas(TheFunction); | |
1835 | |
1836 if (Value *RetVal = Body->Codegen()) { | |
1837 // Finish off the function. | |
1838 Builder.CreateRet(RetVal); | |
1839 | |
1840 // Validate the generated code, checking for consistency. | |
1841 verifyFunction(*TheFunction); | |
1842 | |
1843 // Optimize the function. | |
1844 TheFPM->run(*TheFunction); | |
1845 | |
1846 return TheFunction; | |
1847 } | |
1848 | |
1849 // Error reading body, remove function. | |
1850 TheFunction->eraseFromParent(); | |
1851 | |
1852 if (Proto->isBinaryOp()) | |
1853 BinopPrecedence.erase(Proto->getOperatorName()); | |
1854 return 0; | |
1855 } | |
1856 | |
1857 //===----------------------------------------------------------------------===// | |
1858 // Top-Level parsing and JIT Driver | |
1859 //===----------------------------------------------------------------------===// | |
1860 | |
1861 static ExecutionEngine *TheExecutionEngine; | |
1862 | |
1863 static void HandleDefinition() { | |
1864 if (FunctionAST *F = ParseDefinition()) { | |
1865 if (Function *LF = F->Codegen()) { | |
1866 fprintf(stderr, "Read function definition:"); | |
1867 LF->dump(); | |
1868 } | |
1869 } else { | |
1870 // Skip token for error recovery. | |
1871 getNextToken(); | |
1872 } | |
1873 } | |
1874 | |
1875 static void HandleExtern() { | |
1876 if (PrototypeAST *P = ParseExtern()) { | |
1877 if (Function *F = P->Codegen()) { | |
1878 fprintf(stderr, "Read extern: "); | |
1879 F->dump(); | |
1880 } | |
1881 } else { | |
1882 // Skip token for error recovery. | |
1883 getNextToken(); | |
1884 } | |
1885 } | |
1886 | |
1887 static void HandleTopLevelExpression() { | |
1888 // Evaluate a top-level expression into an anonymous function. | |
1889 if (FunctionAST *F = ParseTopLevelExpr()) { | |
1890 if (Function *LF = F->Codegen()) { | |
1891 // JIT the function, returning a function pointer. | |
1892 void *FPtr = TheExecutionEngine->getPointerToFunction(LF); | |
1893 | |
1894 // Cast it to the right type (takes no arguments, returns a double) so we | |
1895 // can call it as a native function. | |
1896 double (*FP)() = (double (*)())(intptr_t)FPtr; | |
1897 fprintf(stderr, "Evaluated to %f\n", FP()); | |
1898 } | |
1899 } else { | |
1900 // Skip token for error recovery. | |
1901 getNextToken(); | |
1902 } | |
1903 } | |
1904 | |
1905 /// top ::= definition | external | expression | ';' | |
1906 static void MainLoop() { | |
1907 while (1) { | |
1908 fprintf(stderr, "ready> "); | |
1909 switch (CurTok) { | |
1910 case tok_eof: return; | |
1911 case ';': getNextToken(); break; // ignore top-level semicolons. | |
1912 case tok_def: HandleDefinition(); break; | |
1913 case tok_extern: HandleExtern(); break; | |
1914 default: HandleTopLevelExpression(); break; | |
1915 } | |
1916 } | |
1917 } | |
1918 | |
1919 //===----------------------------------------------------------------------===// | |
1920 // "Library" functions that can be "extern'd" from user code. | |
1921 //===----------------------------------------------------------------------===// | |
1922 | |
1923 /// putchard - putchar that takes a double and returns 0. | |
1924 extern "C" | |
1925 double putchard(double X) { | |
1926 putchar((char)X); | |
1927 return 0; | |
1928 } | |
1929 | |
1930 /// printd - printf that takes a double prints it as "%f\n", returning 0. | |
1931 extern "C" | |
1932 double printd(double X) { | |
1933 printf("%f\n", X); | |
1934 return 0; | |
1935 } | |
1936 | |
1937 //===----------------------------------------------------------------------===// | |
1938 // Main driver code. | |
1939 //===----------------------------------------------------------------------===// | |
1940 | |
1941 int main() { | |
1942 InitializeNativeTarget(); | |
1943 LLVMContext &Context = getGlobalContext(); | |
1944 | |
1945 // Install standard binary operators. | |
1946 // 1 is lowest precedence. | |
1947 BinopPrecedence['='] = 2; | |
1948 BinopPrecedence['<'] = 10; | |
1949 BinopPrecedence['+'] = 20; | |
1950 BinopPrecedence['-'] = 20; | |
1951 BinopPrecedence['*'] = 40; // highest. | |
1952 | |
1953 // Prime the first token. | |
1954 fprintf(stderr, "ready> "); | |
1955 getNextToken(); | |
1956 | |
1957 // Make the module, which holds all the code. | |
1958 TheModule = new Module("my cool jit", Context); | |
1959 | |
1960 // Create the JIT. This takes ownership of the module. | |
1961 std::string ErrStr; | |
1962 TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&ErrStr).create(); | |
1963 if (!TheExecutionEngine) { | |
1964 fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str()); | |
1965 exit(1); | |
1966 } | |
1967 | |
1968 FunctionPassManager OurFPM(TheModule); | |
1969 | |
1970 // Set up the optimizer pipeline. Start with registering info about how the | |
1971 // target lays out data structures. | |
1972 OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); | |
1973 // Provide basic AliasAnalysis support for GVN. | |
1974 OurFPM.add(createBasicAliasAnalysisPass()); | |
1975 // Promote allocas to registers. | |
1976 OurFPM.add(createPromoteMemoryToRegisterPass()); | |
1977 // Do simple "peephole" optimizations and bit-twiddling optzns. | |
1978 OurFPM.add(createInstructionCombiningPass()); | |
1979 // Reassociate expressions. | |
1980 OurFPM.add(createReassociatePass()); | |
1981 // Eliminate Common SubExpressions. | |
1982 OurFPM.add(createGVNPass()); | |
1983 // Simplify the control flow graph (deleting unreachable blocks, etc). | |
1984 OurFPM.add(createCFGSimplificationPass()); | |
1985 | |
1986 OurFPM.doInitialization(); | |
1987 | |
1988 // Set the global so the code gen can use this. | |
1989 TheFPM = &OurFPM; | |
1990 | |
1991 // Run the main "interpreter loop" now. | |
1992 MainLoop(); | |
1993 | |
1994 TheFPM = 0; | |
1995 | |
1996 // Print out all of the generated code. | |
1997 TheModule->dump(); | |
1998 | |
1999 return 0; | |
2000 } | |
2001 | |
2002 `Next: Conclusion and other useful LLVM tidbits <LangImpl8.html>`_ | |
2003 |