Mercurial > hg > Members > tobaru > cbc > CbC_llvm
comparison docs/tutorial/LangImpl4.rst @ 0:95c75e76d11b
LLVM 3.4
author | Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> |
---|---|
date | Thu, 12 Dec 2013 13:56:28 +0900 |
parents | |
children | 60c9769439b8 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:95c75e76d11b |
---|---|
1 ============================================== | |
2 Kaleidoscope: Adding JIT and Optimizer Support | |
3 ============================================== | |
4 | |
5 .. contents:: | |
6 :local: | |
7 | |
8 Chapter 4 Introduction | |
9 ====================== | |
10 | |
11 Welcome to Chapter 4 of the "`Implementing a language with | |
12 LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation | |
13 of a simple language and added support for generating LLVM IR. This | |
14 chapter describes two new techniques: adding optimizer support to your | |
15 language, and adding JIT compiler support. These additions will | |
16 demonstrate how to get nice, efficient code for the Kaleidoscope | |
17 language. | |
18 | |
19 Trivial Constant Folding | |
20 ======================== | |
21 | |
22 Our demonstration for Chapter 3 is elegant and easy to extend. | |
23 Unfortunately, it does not produce wonderful code. The IRBuilder, | |
24 however, does give us obvious optimizations when compiling simple code: | |
25 | |
26 :: | |
27 | |
28 ready> def test(x) 1+2+x; | |
29 Read function definition: | |
30 define double @test(double %x) { | |
31 entry: | |
32 %addtmp = fadd double 3.000000e+00, %x | |
33 ret double %addtmp | |
34 } | |
35 | |
36 This code is not a literal transcription of the AST built by parsing the | |
37 input. That would be: | |
38 | |
39 :: | |
40 | |
41 ready> def test(x) 1+2+x; | |
42 Read function definition: | |
43 define double @test(double %x) { | |
44 entry: | |
45 %addtmp = fadd double 2.000000e+00, 1.000000e+00 | |
46 %addtmp1 = fadd double %addtmp, %x | |
47 ret double %addtmp1 | |
48 } | |
49 | |
50 Constant folding, as seen above, in particular, is a very common and | |
51 very important optimization: so much so that many language implementors | |
52 implement constant folding support in their AST representation. | |
53 | |
54 With LLVM, you don't need this support in the AST. Since all calls to | |
55 build LLVM IR go through the LLVM IR builder, the builder itself checked | |
56 to see if there was a constant folding opportunity when you call it. If | |
57 so, it just does the constant fold and return the constant instead of | |
58 creating an instruction. | |
59 | |
60 Well, that was easy :). In practice, we recommend always using | |
61 ``IRBuilder`` when generating code like this. It has no "syntactic | |
62 overhead" for its use (you don't have to uglify your compiler with | |
63 constant checks everywhere) and it can dramatically reduce the amount of | |
64 LLVM IR that is generated in some cases (particular for languages with a | |
65 macro preprocessor or that use a lot of constants). | |
66 | |
67 On the other hand, the ``IRBuilder`` is limited by the fact that it does | |
68 all of its analysis inline with the code as it is built. If you take a | |
69 slightly more complex example: | |
70 | |
71 :: | |
72 | |
73 ready> def test(x) (1+2+x)*(x+(1+2)); | |
74 ready> Read function definition: | |
75 define double @test(double %x) { | |
76 entry: | |
77 %addtmp = fadd double 3.000000e+00, %x | |
78 %addtmp1 = fadd double %x, 3.000000e+00 | |
79 %multmp = fmul double %addtmp, %addtmp1 | |
80 ret double %multmp | |
81 } | |
82 | |
83 In this case, the LHS and RHS of the multiplication are the same value. | |
84 We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``" | |
85 instead of computing "``x+3``" twice. | |
86 | |
87 Unfortunately, no amount of local analysis will be able to detect and | |
88 correct this. This requires two transformations: reassociation of | |
89 expressions (to make the add's lexically identical) and Common | |
90 Subexpression Elimination (CSE) to delete the redundant add instruction. | |
91 Fortunately, LLVM provides a broad range of optimizations that you can | |
92 use, in the form of "passes". | |
93 | |
94 LLVM Optimization Passes | |
95 ======================== | |
96 | |
97 LLVM provides many optimization passes, which do many different sorts of | |
98 things and have different tradeoffs. Unlike other systems, LLVM doesn't | |
99 hold to the mistaken notion that one set of optimizations is right for | |
100 all languages and for all situations. LLVM allows a compiler implementor | |
101 to make complete decisions about what optimizations to use, in which | |
102 order, and in what situation. | |
103 | |
104 As a concrete example, LLVM supports both "whole module" passes, which | |
105 look across as large of body of code as they can (often a whole file, | |
106 but if run at link time, this can be a substantial portion of the whole | |
107 program). It also supports and includes "per-function" passes which just | |
108 operate on a single function at a time, without looking at other | |
109 functions. For more information on passes and how they are run, see the | |
110 `How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the | |
111 `List of LLVM Passes <../Passes.html>`_. | |
112 | |
113 For Kaleidoscope, we are currently generating functions on the fly, one | |
114 at a time, as the user types them in. We aren't shooting for the | |
115 ultimate optimization experience in this setting, but we also want to | |
116 catch the easy and quick stuff where possible. As such, we will choose | |
117 to run a few per-function optimizations as the user types the function | |
118 in. If we wanted to make a "static Kaleidoscope compiler", we would use | |
119 exactly the code we have now, except that we would defer running the | |
120 optimizer until the entire file has been parsed. | |
121 | |
122 In order to get per-function optimizations going, we need to set up a | |
123 `FunctionPassManager <../WritingAnLLVMPass.html#passmanager>`_ to hold | |
124 and organize the LLVM optimizations that we want to run. Once we have | |
125 that, we can add a set of optimizations to run. The code looks like | |
126 this: | |
127 | |
128 .. code-block:: c++ | |
129 | |
130 FunctionPassManager OurFPM(TheModule); | |
131 | |
132 // Set up the optimizer pipeline. Start with registering info about how the | |
133 // target lays out data structures. | |
134 OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); | |
135 // Provide basic AliasAnalysis support for GVN. | |
136 OurFPM.add(createBasicAliasAnalysisPass()); | |
137 // Do simple "peephole" optimizations and bit-twiddling optzns. | |
138 OurFPM.add(createInstructionCombiningPass()); | |
139 // Reassociate expressions. | |
140 OurFPM.add(createReassociatePass()); | |
141 // Eliminate Common SubExpressions. | |
142 OurFPM.add(createGVNPass()); | |
143 // Simplify the control flow graph (deleting unreachable blocks, etc). | |
144 OurFPM.add(createCFGSimplificationPass()); | |
145 | |
146 OurFPM.doInitialization(); | |
147 | |
148 // Set the global so the code gen can use this. | |
149 TheFPM = &OurFPM; | |
150 | |
151 // Run the main "interpreter loop" now. | |
152 MainLoop(); | |
153 | |
154 This code defines a ``FunctionPassManager``, "``OurFPM``". It requires a | |
155 pointer to the ``Module`` to construct itself. Once it is set up, we use | |
156 a series of "add" calls to add a bunch of LLVM passes. The first pass is | |
157 basically boilerplate, it adds a pass so that later optimizations know | |
158 how the data structures in the program are laid out. The | |
159 "``TheExecutionEngine``" variable is related to the JIT, which we will | |
160 get to in the next section. | |
161 | |
162 In this case, we choose to add 4 optimization passes. The passes we | |
163 chose here are a pretty standard set of "cleanup" optimizations that are | |
164 useful for a wide variety of code. I won't delve into what they do but, | |
165 believe me, they are a good starting place :). | |
166 | |
167 Once the PassManager is set up, we need to make use of it. We do this by | |
168 running it after our newly created function is constructed (in | |
169 ``FunctionAST::Codegen``), but before it is returned to the client: | |
170 | |
171 .. code-block:: c++ | |
172 | |
173 if (Value *RetVal = Body->Codegen()) { | |
174 // Finish off the function. | |
175 Builder.CreateRet(RetVal); | |
176 | |
177 // Validate the generated code, checking for consistency. | |
178 verifyFunction(*TheFunction); | |
179 | |
180 // Optimize the function. | |
181 TheFPM->run(*TheFunction); | |
182 | |
183 return TheFunction; | |
184 } | |
185 | |
186 As you can see, this is pretty straightforward. The | |
187 ``FunctionPassManager`` optimizes and updates the LLVM Function\* in | |
188 place, improving (hopefully) its body. With this in place, we can try | |
189 our test above again: | |
190 | |
191 :: | |
192 | |
193 ready> def test(x) (1+2+x)*(x+(1+2)); | |
194 ready> Read function definition: | |
195 define double @test(double %x) { | |
196 entry: | |
197 %addtmp = fadd double %x, 3.000000e+00 | |
198 %multmp = fmul double %addtmp, %addtmp | |
199 ret double %multmp | |
200 } | |
201 | |
202 As expected, we now get our nicely optimized code, saving a floating | |
203 point add instruction from every execution of this function. | |
204 | |
205 LLVM provides a wide variety of optimizations that can be used in | |
206 certain circumstances. Some `documentation about the various | |
207 passes <../Passes.html>`_ is available, but it isn't very complete. | |
208 Another good source of ideas can come from looking at the passes that | |
209 ``Clang`` runs to get started. The "``opt``" tool allows you to | |
210 experiment with passes from the command line, so you can see if they do | |
211 anything. | |
212 | |
213 Now that we have reasonable code coming out of our front-end, lets talk | |
214 about executing it! | |
215 | |
216 Adding a JIT Compiler | |
217 ===================== | |
218 | |
219 Code that is available in LLVM IR can have a wide variety of tools | |
220 applied to it. For example, you can run optimizations on it (as we did | |
221 above), you can dump it out in textual or binary forms, you can compile | |
222 the code to an assembly file (.s) for some target, or you can JIT | |
223 compile it. The nice thing about the LLVM IR representation is that it | |
224 is the "common currency" between many different parts of the compiler. | |
225 | |
226 In this section, we'll add JIT compiler support to our interpreter. The | |
227 basic idea that we want for Kaleidoscope is to have the user enter | |
228 function bodies as they do now, but immediately evaluate the top-level | |
229 expressions they type in. For example, if they type in "1 + 2;", we | |
230 should evaluate and print out 3. If they define a function, they should | |
231 be able to call it from the command line. | |
232 | |
233 In order to do this, we first declare and initialize the JIT. This is | |
234 done by adding a global variable and a call in ``main``: | |
235 | |
236 .. code-block:: c++ | |
237 | |
238 static ExecutionEngine *TheExecutionEngine; | |
239 ... | |
240 int main() { | |
241 .. | |
242 // Create the JIT. This takes ownership of the module. | |
243 TheExecutionEngine = EngineBuilder(TheModule).create(); | |
244 .. | |
245 } | |
246 | |
247 This creates an abstract "Execution Engine" which can be either a JIT | |
248 compiler or the LLVM interpreter. LLVM will automatically pick a JIT | |
249 compiler for you if one is available for your platform, otherwise it | |
250 will fall back to the interpreter. | |
251 | |
252 Once the ``ExecutionEngine`` is created, the JIT is ready to be used. | |
253 There are a variety of APIs that are useful, but the simplest one is the | |
254 "``getPointerToFunction(F)``" method. This method JIT compiles the | |
255 specified LLVM Function and returns a function pointer to the generated | |
256 machine code. In our case, this means that we can change the code that | |
257 parses a top-level expression to look like this: | |
258 | |
259 .. code-block:: c++ | |
260 | |
261 static void HandleTopLevelExpression() { | |
262 // Evaluate a top-level expression into an anonymous function. | |
263 if (FunctionAST *F = ParseTopLevelExpr()) { | |
264 if (Function *LF = F->Codegen()) { | |
265 LF->dump(); // Dump the function for exposition purposes. | |
266 | |
267 // JIT the function, returning a function pointer. | |
268 void *FPtr = TheExecutionEngine->getPointerToFunction(LF); | |
269 | |
270 // Cast it to the right type (takes no arguments, returns a double) so we | |
271 // can call it as a native function. | |
272 double (*FP)() = (double (*)())(intptr_t)FPtr; | |
273 fprintf(stderr, "Evaluated to %f\n", FP()); | |
274 } | |
275 | |
276 Recall that we compile top-level expressions into a self-contained LLVM | |
277 function that takes no arguments and returns the computed double. | |
278 Because the LLVM JIT compiler matches the native platform ABI, this | |
279 means that you can just cast the result pointer to a function pointer of | |
280 that type and call it directly. This means, there is no difference | |
281 between JIT compiled code and native machine code that is statically | |
282 linked into your application. | |
283 | |
284 With just these two changes, lets see how Kaleidoscope works now! | |
285 | |
286 :: | |
287 | |
288 ready> 4+5; | |
289 Read top-level expression: | |
290 define double @0() { | |
291 entry: | |
292 ret double 9.000000e+00 | |
293 } | |
294 | |
295 Evaluated to 9.000000 | |
296 | |
297 Well this looks like it is basically working. The dump of the function | |
298 shows the "no argument function that always returns double" that we | |
299 synthesize for each top-level expression that is typed in. This | |
300 demonstrates very basic functionality, but can we do more? | |
301 | |
302 :: | |
303 | |
304 ready> def testfunc(x y) x + y*2; | |
305 Read function definition: | |
306 define double @testfunc(double %x, double %y) { | |
307 entry: | |
308 %multmp = fmul double %y, 2.000000e+00 | |
309 %addtmp = fadd double %multmp, %x | |
310 ret double %addtmp | |
311 } | |
312 | |
313 ready> testfunc(4, 10); | |
314 Read top-level expression: | |
315 define double @1() { | |
316 entry: | |
317 %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) | |
318 ret double %calltmp | |
319 } | |
320 | |
321 Evaluated to 24.000000 | |
322 | |
323 This illustrates that we can now call user code, but there is something | |
324 a bit subtle going on here. Note that we only invoke the JIT on the | |
325 anonymous functions that *call testfunc*, but we never invoked it on | |
326 *testfunc* itself. What actually happened here is that the JIT scanned | |
327 for all non-JIT'd functions transitively called from the anonymous | |
328 function and compiled all of them before returning from | |
329 ``getPointerToFunction()``. | |
330 | |
331 The JIT provides a number of other more advanced interfaces for things | |
332 like freeing allocated machine code, rejit'ing functions to update them, | |
333 etc. However, even with this simple code, we get some surprisingly | |
334 powerful capabilities - check this out (I removed the dump of the | |
335 anonymous functions, you should get the idea by now :) : | |
336 | |
337 :: | |
338 | |
339 ready> extern sin(x); | |
340 Read extern: | |
341 declare double @sin(double) | |
342 | |
343 ready> extern cos(x); | |
344 Read extern: | |
345 declare double @cos(double) | |
346 | |
347 ready> sin(1.0); | |
348 Read top-level expression: | |
349 define double @2() { | |
350 entry: | |
351 ret double 0x3FEAED548F090CEE | |
352 } | |
353 | |
354 Evaluated to 0.841471 | |
355 | |
356 ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); | |
357 Read function definition: | |
358 define double @foo(double %x) { | |
359 entry: | |
360 %calltmp = call double @sin(double %x) | |
361 %multmp = fmul double %calltmp, %calltmp | |
362 %calltmp2 = call double @cos(double %x) | |
363 %multmp4 = fmul double %calltmp2, %calltmp2 | |
364 %addtmp = fadd double %multmp, %multmp4 | |
365 ret double %addtmp | |
366 } | |
367 | |
368 ready> foo(4.0); | |
369 Read top-level expression: | |
370 define double @3() { | |
371 entry: | |
372 %calltmp = call double @foo(double 4.000000e+00) | |
373 ret double %calltmp | |
374 } | |
375 | |
376 Evaluated to 1.000000 | |
377 | |
378 Whoa, how does the JIT know about sin and cos? The answer is | |
379 surprisingly simple: in this example, the JIT started execution of a | |
380 function and got to a function call. It realized that the function was | |
381 not yet JIT compiled and invoked the standard set of routines to resolve | |
382 the function. In this case, there is no body defined for the function, | |
383 so the JIT ended up calling "``dlsym("sin")``" on the Kaleidoscope | |
384 process itself. Since "``sin``" is defined within the JIT's address | |
385 space, it simply patches up calls in the module to call the libm version | |
386 of ``sin`` directly. | |
387 | |
388 The LLVM JIT provides a number of interfaces (look in the | |
389 ``ExecutionEngine.h`` file) for controlling how unknown functions get | |
390 resolved. It allows you to establish explicit mappings between IR | |
391 objects and addresses (useful for LLVM global variables that you want to | |
392 map to static tables, for example), allows you to dynamically decide on | |
393 the fly based on the function name, and even allows you to have the JIT | |
394 compile functions lazily the first time they're called. | |
395 | |
396 One interesting application of this is that we can now extend the | |
397 language by writing arbitrary C++ code to implement operations. For | |
398 example, if we add: | |
399 | |
400 .. code-block:: c++ | |
401 | |
402 /// putchard - putchar that takes a double and returns 0. | |
403 extern "C" | |
404 double putchard(double X) { | |
405 putchar((char)X); | |
406 return 0; | |
407 } | |
408 | |
409 Now we can produce simple output to the console by using things like: | |
410 "``extern putchard(x); putchard(120);``", which prints a lowercase 'x' | |
411 on the console (120 is the ASCII code for 'x'). Similar code could be | |
412 used to implement file I/O, console input, and many other capabilities | |
413 in Kaleidoscope. | |
414 | |
415 This completes the JIT and optimizer chapter of the Kaleidoscope | |
416 tutorial. At this point, we can compile a non-Turing-complete | |
417 programming language, optimize and JIT compile it in a user-driven way. | |
418 Next up we'll look into `extending the language with control flow | |
419 constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues | |
420 along the way. | |
421 | |
422 Full Code Listing | |
423 ================= | |
424 | |
425 Here is the complete code listing for our running example, enhanced with | |
426 the LLVM JIT and optimizer. To build this example, use: | |
427 | |
428 .. code-block:: bash | |
429 | |
430 # Compile | |
431 clang++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy | |
432 # Run | |
433 ./toy | |
434 | |
435 If you are compiling this on Linux, make sure to add the "-rdynamic" | |
436 option as well. This makes sure that the external functions are resolved | |
437 properly at runtime. | |
438 | |
439 Here is the code: | |
440 | |
441 .. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp | |
442 :language: c++ | |
443 | |
444 `Next: Extending the language: control flow <LangImpl5.html>`_ | |
445 |