150
|
1 =======================================================
|
|
2 Building a JIT: Starting out with KaleidoscopeJIT
|
|
3 =======================================================
|
|
4
|
|
5 .. contents::
|
|
6 :local:
|
|
7
|
|
8 Chapter 1 Introduction
|
|
9 ======================
|
|
10
|
|
11 **Warning: This tutorial is currently being updated to account for ORC API
|
|
12 changes. Only Chapters 1 and 2 are up-to-date.**
|
|
13
|
|
14 **Example code from Chapters 3 to 5 will compile and run, but has not been
|
|
15 updated**
|
|
16
|
|
17 Welcome to Chapter 1 of the "Building an ORC-based JIT in LLVM" tutorial. This
|
|
18 tutorial runs through the implementation of a JIT compiler using LLVM's
|
|
19 On-Request-Compilation (ORC) APIs. It begins with a simplified version of the
|
|
20 KaleidoscopeJIT class used in the
|
|
21 `Implementing a language with LLVM <LangImpl01.html>`_ tutorials and then
|
|
22 introduces new features like concurrent compilation, optimization, lazy
|
|
23 compilation and remote execution.
|
|
24
|
|
25 The goal of this tutorial is to introduce you to LLVM's ORC JIT APIs, show how
|
|
26 these APIs interact with other parts of LLVM, and to teach you how to recombine
|
|
27 them to build a custom JIT that is suited to your use-case.
|
|
28
|
|
29 The structure of the tutorial is:
|
|
30
|
|
31 - Chapter #1: Investigate the simple KaleidoscopeJIT class. This will
|
|
32 introduce some of the basic concepts of the ORC JIT APIs, including the
|
|
33 idea of an ORC *Layer*.
|
|
34
|
|
35 - `Chapter #2 <BuildingAJIT2.html>`_: Extend the basic KaleidoscopeJIT by adding
|
|
36 a new layer that will optimize IR and generated code.
|
|
37
|
|
38 - `Chapter #3 <BuildingAJIT3.html>`_: Further extend the JIT by adding a
|
|
39 Compile-On-Demand layer to lazily compile IR.
|
|
40
|
|
41 - `Chapter #4 <BuildingAJIT4.html>`_: Improve the laziness of our JIT by
|
|
42 replacing the Compile-On-Demand layer with a custom layer that uses the ORC
|
|
43 Compile Callbacks API directly to defer IR-generation until functions are
|
|
44 called.
|
|
45
|
|
46 - `Chapter #5 <BuildingAJIT5.html>`_: Add process isolation by JITing code into
|
|
47 a remote process with reduced privileges using the JIT Remote APIs.
|
|
48
|
|
49 To provide input for our JIT we will use a lightly modified version of the
|
|
50 Kaleidoscope REPL from `Chapter 7 <LangImpl07.html>`_ of the "Implementing a
|
|
51 language in LLVM tutorial".
|
|
52
|
|
53 Finally, a word on API generations: ORC is the 3rd generation of LLVM JIT API.
|
|
54 It was preceded by MCJIT, and before that by the (now deleted) legacy JIT.
|
|
55 These tutorials don't assume any experience with these earlier APIs, but
|
|
56 readers acquainted with them will see many familiar elements. Where appropriate
|
|
57 we will make this connection with the earlier APIs explicit to help people who
|
|
58 are transitioning from them to ORC.
|
|
59
|
|
60 JIT API Basics
|
|
61 ==============
|
|
62
|
|
63 The purpose of a JIT compiler is to compile code "on-the-fly" as it is needed,
|
|
64 rather than compiling whole programs to disk ahead of time as a traditional
|
|
65 compiler does. To support that aim our initial, bare-bones JIT API will have
|
|
66 just two functions:
|
|
67
|
|
68 1. ``Error addModule(std::unique_ptr<Module> M)``: Make the given IR module
|
|
69 available for execution.
|
|
70 2. ``Expected<JITEvaluatedSymbol> lookup()``: Search for pointers to
|
|
71 symbols (functions or variables) that have been added to the JIT.
|
|
72
|
|
73 A basic use-case for this API, executing the 'main' function from a module,
|
|
74 will look like:
|
|
75
|
|
76 .. code-block:: c++
|
|
77
|
|
78 JIT J;
|
|
79 J.addModule(buildModule());
|
|
80 auto *Main = (int(*)(int, char*[]))J.lookup("main").getAddress();
|
|
81 int Result = Main();
|
|
82
|
|
83 The APIs that we build in these tutorials will all be variations on this simple
|
|
84 theme. Behind this API we will refine the implementation of the JIT to add
|
|
85 support for concurrent compilation, optimization and lazy compilation.
|
|
86 Eventually we will extend the API itself to allow higher-level program
|
|
87 representations (e.g. ASTs) to be added to the JIT.
|
|
88
|
|
89 KaleidoscopeJIT
|
|
90 ===============
|
|
91
|
|
92 In the previous section we described our API, now we examine a simple
|
|
93 implementation of it: The KaleidoscopeJIT class [1]_ that was used in the
|
|
94 `Implementing a language with LLVM <LangImpl01.html>`_ tutorials. We will use
|
|
95 the REPL code from `Chapter 7 <LangImpl07.html>`_ of that tutorial to supply the
|
|
96 input for our JIT: Each time the user enters an expression the REPL will add a
|
|
97 new IR module containing the code for that expression to the JIT. If the
|
|
98 expression is a top-level expression like '1+1' or 'sin(x)', the REPL will also
|
|
99 use the lookup method of our JIT class find and execute the code for the
|
|
100 expression. In later chapters of this tutorial we will modify the REPL to enable
|
|
101 new interactions with our JIT class, but for now we will take this setup for
|
|
102 granted and focus our attention on the implementation of our JIT itself.
|
|
103
|
|
104 Our KaleidoscopeJIT class is defined in the KaleidoscopeJIT.h header. After the
|
|
105 usual include guards and #includes [2]_, we get to the definition of our class:
|
|
106
|
|
107 .. code-block:: c++
|
|
108
|
|
109 #ifndef LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H
|
|
110 #define LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H
|
|
111
|
|
112 #include "llvm/ADT/StringRef.h"
|
|
113 #include "llvm/ExecutionEngine/JITSymbol.h"
|
|
114 #include "llvm/ExecutionEngine/Orc/CompileUtils.h"
|
|
115 #include "llvm/ExecutionEngine/Orc/Core.h"
|
|
116 #include "llvm/ExecutionEngine/Orc/ExecutionUtils.h"
|
|
117 #include "llvm/ExecutionEngine/Orc/IRCompileLayer.h"
|
|
118 #include "llvm/ExecutionEngine/Orc/JITTargetMachineBuilder.h"
|
|
119 #include "llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h"
|
|
120 #include "llvm/ExecutionEngine/SectionMemoryManager.h"
|
|
121 #include "llvm/IR/DataLayout.h"
|
|
122 #include "llvm/IR/LLVMContext.h"
|
|
123 #include <memory>
|
|
124
|
|
125 namespace llvm {
|
|
126 namespace orc {
|
|
127
|
|
128 class KaleidoscopeJIT {
|
|
129 private:
|
|
130 ExecutionSession ES;
|
|
131 RTDyldObjectLinkingLayer ObjectLayer;
|
|
132 IRCompileLayer CompileLayer;
|
|
133
|
|
134 DataLayout DL;
|
|
135 MangleAndInterner Mangle;
|
|
136 ThreadSafeContext Ctx;
|
|
137
|
|
138 public:
|
|
139 KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL)
|
|
140 : ObjectLayer(ES,
|
|
141 []() { return std::make_unique<SectionMemoryManager>(); }),
|
|
142 CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))),
|
|
143 DL(std::move(DL)), Mangle(ES, this->DL),
|
|
144 Ctx(std::make_unique<LLVMContext>()) {
|
207
|
145 ES.getMainJITDylib().addGenerator(
|
|
146 cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL.getGlobalPrefix())));
|
150
|
147 }
|
|
148
|
|
149 Our class begins with six member variables: An ExecutionSession member, ``ES``,
|
|
150 which provides context for our running JIT'd code (including the string pool,
|
|
151 global mutex, and error reporting facilities); An RTDyldObjectLinkingLayer,
|
|
152 ``ObjectLayer``, that can be used to add object files to our JIT (though we will
|
|
153 not use it directly); An IRCompileLayer, ``CompileLayer``, that can be used to
|
|
154 add LLVM Modules to our JIT (and which builds on the ObjectLayer), A DataLayout
|
|
155 and MangleAndInterner, ``DL`` and ``Mangle``, that will be used for symbol mangling
|
|
156 (more on that later); and finally an LLVMContext that clients will use when
|
|
157 building IR files for the JIT.
|
|
158
|
|
159 Next up we have our class constructor, which takes a `JITTargetMachineBuilder``
|
|
160 that will be used by our IRCompiler, and a ``DataLayout`` that we will use to
|
|
161 initialize our DL member. The constructor begins by initializing our
|
|
162 ObjectLayer. The ObjectLayer requires a reference to the ExecutionSession, and
|
|
163 a function object that will build a JIT memory manager for each module that is
|
|
164 added (a JIT memory manager manages memory allocations, memory permissions, and
|
|
165 registration of exception handlers for JIT'd code). For this we use a lambda
|
|
166 that returns a SectionMemoryManager, an off-the-shelf utility that provides all
|
|
167 the basic memory management functionality required for this chapter. Next we
|
|
168 initialize our CompileLayer. The CompileLayer needs three things: (1) A
|
|
169 reference to the ExecutionSession, (2) A reference to our object layer, and (3)
|
|
170 a compiler instance to use to perform the actual compilation from IR to object
|
|
171 files. We use the off-the-shelf ConcurrentIRCompiler utility as our compiler,
|
|
172 which we construct using this constructor's JITTargetMachineBuilder argument.
|
|
173 The ConcurrentIRCompiler utility will use the JITTargetMachineBuilder to build
|
|
174 llvm TargetMachines (which are not thread safe) as needed for compiles. After
|
|
175 this, we initialize our supporting members: ``DL``, ``Mangler`` and ``Ctx`` with
|
|
176 the input DataLayout, the ExecutionSession and DL member, and a new default
|
|
177 constructed LLVMContext respectively. Now that our members have been initialized,
|
|
178 so the one thing that remains to do is to tweak the configuration of the
|
|
179 *JITDylib* that we will store our code in. We want to modify this dylib to
|
|
180 contain not only the symbols that we add to it, but also the symbols from our
|
|
181 REPL process as well. We do this by attaching a
|
|
182 ``DynamicLibrarySearchGenerator`` instance using the
|
|
183 ``DynamicLibrarySearchGenerator::GetForCurrentProcess`` method.
|
|
184
|
|
185
|
|
186 .. code-block:: c++
|
|
187
|
|
188 static Expected<std::unique_ptr<KaleidoscopeJIT>> Create() {
|
|
189 auto JTMB = JITTargetMachineBuilder::detectHost();
|
|
190
|
|
191 if (!JTMB)
|
|
192 return JTMB.takeError();
|
|
193
|
|
194 auto DL = JTMB->getDefaultDataLayoutForTarget();
|
|
195 if (!DL)
|
|
196 return DL.takeError();
|
|
197
|
|
198 return std::make_unique<KaleidoscopeJIT>(std::move(*JTMB), std::move(*DL));
|
|
199 }
|
|
200
|
|
201 const DataLayout &getDataLayout() const { return DL; }
|
|
202
|
|
203 LLVMContext &getContext() { return *Ctx.getContext(); }
|
|
204
|
|
205 Next we have a named constructor, ``Create``, which will build a KaleidoscopeJIT
|
|
206 instance that is configured to generate code for our host process. It does this
|
|
207 by first generating a JITTargetMachineBuilder instance using that classes'
|
|
208 detectHost method and then using that instance to generate a datalayout for
|
|
209 the target process. Each of these operations can fail, so each returns its
|
|
210 result wrapped in an Expected value [3]_ that we must check for error before
|
|
211 continuing. If both operations succeed we can unwrap their results (using the
|
|
212 dereference operator) and pass them into KaleidoscopeJIT's constructor on the
|
|
213 last line of the function.
|
|
214
|
|
215 Following the named constructor we have the ``getDataLayout()`` and
|
|
216 ``getContext()`` methods. These are used to make data structures created and
|
|
217 managed by the JIT (especially the LLVMContext) available to the REPL code that
|
|
218 will build our IR modules.
|
|
219
|
|
220 .. code-block:: c++
|
|
221
|
|
222 void addModule(std::unique_ptr<Module> M) {
|
|
223 cantFail(CompileLayer.add(ES.getMainJITDylib(),
|
|
224 ThreadSafeModule(std::move(M), Ctx)));
|
|
225 }
|
|
226
|
|
227 Expected<JITEvaluatedSymbol> lookup(StringRef Name) {
|
|
228 return ES.lookup({&ES.getMainJITDylib()}, Mangle(Name.str()));
|
|
229 }
|
|
230
|
|
231 Now we come to the first of our JIT API methods: addModule. This method is
|
|
232 responsible for adding IR to the JIT and making it available for execution. In
|
|
233 this initial implementation of our JIT we will make our modules "available for
|
|
234 execution" by adding them to the CompileLayer, which will it turn store the
|
|
235 Module in the main JITDylib. This process will create new symbol table entries
|
|
236 in the JITDylib for each definition in the module, and will defer compilation of
|
|
237 the module until any of its definitions is looked up. Note that this is not lazy
|
|
238 compilation: just referencing a definition, even if it is never used, will be
|
|
239 enough to trigger compilation. In later chapters we will teach our JIT to defer
|
|
240 compilation of functions until they're actually called. To add our Module we
|
|
241 must first wrap it in a ThreadSafeModule instance, which manages the lifetime of
|
|
242 the Module's LLVMContext (our Ctx member) in a thread-friendly way. In our
|
|
243 example, all modules will share the Ctx member, which will exist for the
|
|
244 duration of the JIT. Once we switch to concurrent compilation in later chapters
|
|
245 we will use a new context per module.
|
|
246
|
|
247 Our last method is ``lookup``, which allows us to look up addresses for
|
|
248 function and variable definitions added to the JIT based on their symbol names.
|
|
249 As noted above, lookup will implicitly trigger compilation for any symbol
|
|
250 that has not already been compiled. Our lookup method calls through to
|
|
251 `ExecutionSession::lookup`, passing in a list of dylibs to search (in our case
|
|
252 just the main dylib), and the symbol name to search for, with a twist: We have
|
|
253 to *mangle* the name of the symbol we're searching for first. The ORC JIT
|
|
254 components use mangled symbols internally the same way a static compiler and
|
|
255 linker would, rather than using plain IR symbol names. This allows JIT'd code
|
|
256 to interoperate easily with precompiled code in the application or shared
|
|
257 libraries. The kind of mangling will depend on the DataLayout, which in turn
|
|
258 depends on the target platform. To allow us to remain portable and search based
|
|
259 on the un-mangled name, we just re-produce this mangling ourselves using our
|
|
260 ``Mangle`` member function object.
|
|
261
|
|
262 This brings us to the end of Chapter 1 of Building a JIT. You now have a basic
|
|
263 but fully functioning JIT stack that you can use to take LLVM IR and make it
|
|
264 executable within the context of your JIT process. In the next chapter we'll
|
|
265 look at how to extend this JIT to produce better quality code, and in the
|
|
266 process take a deeper look at the ORC layer concept.
|
|
267
|
|
268 `Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_
|
|
269
|
|
270 Full Code Listing
|
|
271 =================
|
|
272
|
|
273 Here is the complete code listing for our running example. To build this
|
|
274 example, use:
|
|
275
|
|
276 .. code-block:: bash
|
|
277
|
|
278 # Compile
|
|
279 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
|
|
280 # Run
|
|
281 ./toy
|
|
282
|
|
283 Here is the code:
|
|
284
|
|
285 .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h
|
|
286 :language: c++
|
|
287
|
|
288 .. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a
|
|
289 simplifying assumption: symbols cannot be re-defined. This will make it
|
|
290 impossible to re-define symbols in the REPL, but will make our symbol
|
|
291 lookup logic simpler. Re-introducing support for symbol redefinition is
|
|
292 left as an exercise for the reader. (The KaleidoscopeJIT.h used in the
|
|
293 original tutorials will be a helpful reference).
|
|
294
|
|
295 .. [2] +-----------------------------+-----------------------------------------------+
|
|
296 | File | Reason for inclusion |
|
|
297 +=============================+===============================================+
|
|
298 | JITSymbol.h | Defines the lookup result type |
|
|
299 | | JITEvaluatedSymbol |
|
|
300 +-----------------------------+-----------------------------------------------+
|
|
301 | CompileUtils.h | Provides the SimpleCompiler class. |
|
|
302 +-----------------------------+-----------------------------------------------+
|
|
303 | Core.h | Core utilities such as ExecutionSession and |
|
|
304 | | JITDylib. |
|
|
305 +-----------------------------+-----------------------------------------------+
|
|
306 | ExecutionUtils.h | Provides the DynamicLibrarySearchGenerator |
|
|
307 | | class. |
|
|
308 +-----------------------------+-----------------------------------------------+
|
|
309 | IRCompileLayer.h | Provides the IRCompileLayer class. |
|
|
310 +-----------------------------+-----------------------------------------------+
|
|
311 | JITTargetMachineBuilder.h | Provides the JITTargetMachineBuilder class. |
|
|
312 +-----------------------------+-----------------------------------------------+
|
|
313 | RTDyldObjectLinkingLayer.h | Provides the RTDyldObjectLinkingLayer class. |
|
|
314 +-----------------------------+-----------------------------------------------+
|
|
315 | SectionMemoryManager.h | Provides the SectionMemoryManager class. |
|
|
316 +-----------------------------+-----------------------------------------------+
|
|
317 | DataLayout.h | Provides the DataLayout class. |
|
|
318 +-----------------------------+-----------------------------------------------+
|
|
319 | LLVMContext.h | Provides the LLVMContext class. |
|
|
320 +-----------------------------+-----------------------------------------------+
|
|
321
|
|
322 .. [3] See the ErrorHandling section in the LLVM Programmer's Manual
|
173
|
323 (https://llvm.org/docs/ProgrammersManual.html#error-handling)
|