120
|
1 =====================================================================
|
|
2 Building a JIT: Adding Optimizations -- An introduction to ORC Layers
|
|
3 =====================================================================
|
|
4
|
|
5 .. contents::
|
|
6 :local:
|
|
7
|
|
8 **This tutorial is under active development. It is incomplete and details may
|
|
9 change frequently.** Nonetheless we invite you to try it out as it stands, and
|
|
10 we welcome any feedback.
|
|
11
|
|
12 Chapter 2 Introduction
|
|
13 ======================
|
|
14
|
134
|
15 **Warning: This text is currently out of date due to ORC API updates.**
|
|
16
|
|
17 **The example code has been updated and can be used. The text will be updated
|
|
18 once the API churn dies down.**
|
|
19
|
120
|
20 Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In
|
|
21 `Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT
|
|
22 class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce
|
|
23 executable code in memory. KaleidoscopeJIT was able to do this with relatively
|
|
24 little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and
|
|
25 ObjectLinkingLayer, to do much of the heavy lifting.
|
|
26
|
|
27 In this layer we'll learn more about the ORC layer concept by using a new layer,
|
|
28 IRTransformLayer, to add IR optimization support to KaleidoscopeJIT.
|
|
29
|
|
30 Optimizing Modules using the IRTransformLayer
|
|
31 =============================================
|
|
32
|
121
|
33 In `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM"
|
120
|
34 tutorial series the llvm *FunctionPassManager* is introduced as a means for
|
|
35 optimizing LLVM IR. Interested readers may read that chapter for details, but
|
|
36 in short: to optimize a Module we create an llvm::FunctionPassManager
|
|
37 instance, configure it with a set of optimizations, then run the PassManager on
|
|
38 a Module to mutate it into a (hopefully) more optimized but semantically
|
|
39 equivalent form. In the original tutorial series the FunctionPassManager was
|
|
40 created outside the KaleidoscopeJIT and modules were optimized before being
|
|
41 added to it. In this Chapter we will make optimization a phase of our JIT
|
|
42 instead. For now this will provide us a motivation to learn more about ORC
|
|
43 layers, but in the long term making optimization part of our JIT will yield an
|
|
44 important benefit: When we begin lazily compiling code (i.e. deferring
|
|
45 compilation of each function until the first time it's run), having
|
|
46 optimization managed by our JIT will allow us to optimize lazily too, rather
|
|
47 than having to do all our optimization up-front.
|
|
48
|
|
49 To add optimization support to our JIT we will take the KaleidoscopeJIT from
|
|
50 Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the
|
|
51 IRTransformLayer works in more detail below, but the interface is simple: the
|
|
52 constructor for this layer takes a reference to the layer below (as all layers
|
|
53 do) plus an *IR optimization function* that it will apply to each Module that
|
121
|
54 is added via addModule:
|
120
|
55
|
|
56 .. code-block:: c++
|
|
57
|
|
58 class KaleidoscopeJIT {
|
|
59 private:
|
|
60 std::unique_ptr<TargetMachine> TM;
|
|
61 const DataLayout DL;
|
121
|
62 RTDyldObjectLinkingLayer<> ObjectLayer;
|
120
|
63 IRCompileLayer<decltype(ObjectLayer)> CompileLayer;
|
|
64
|
121
|
65 using OptimizeFunction =
|
|
66 std::function<std::shared_ptr<Module>(std::shared_ptr<Module>)>;
|
120
|
67
|
|
68 IRTransformLayer<decltype(CompileLayer), OptimizeFunction> OptimizeLayer;
|
|
69
|
|
70 public:
|
121
|
71 using ModuleHandle = decltype(OptimizeLayer)::ModuleHandleT;
|
120
|
72
|
|
73 KaleidoscopeJIT()
|
|
74 : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()),
|
121
|
75 ObjectLayer([]() { return std::make_shared<SectionMemoryManager>(); }),
|
120
|
76 CompileLayer(ObjectLayer, SimpleCompiler(*TM)),
|
|
77 OptimizeLayer(CompileLayer,
|
|
78 [this](std::unique_ptr<Module> M) {
|
|
79 return optimizeModule(std::move(M));
|
|
80 }) {
|
|
81 llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);
|
|
82 }
|
|
83
|
|
84 Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1,
|
|
85 but after the CompileLayer we introduce a typedef for our optimization function.
|
|
86 In this case we use a std::function (a handy wrapper for "function-like" things)
|
|
87 from a single unique_ptr<Module> input to a std::unique_ptr<Module> output. With
|
|
88 our optimization function typedef in place we can declare our OptimizeLayer,
|
|
89 which sits on top of our CompileLayer.
|
|
90
|
|
91 To initialize our OptimizeLayer we pass it a reference to the CompileLayer
|
|
92 below (standard practice for layers), and we initialize the OptimizeFunction
|
|
93 using a lambda that calls out to an "optimizeModule" function that we will
|
|
94 define below.
|
|
95
|
|
96 .. code-block:: c++
|
|
97
|
|
98 // ...
|
|
99 auto Resolver = createLambdaResolver(
|
|
100 [&](const std::string &Name) {
|
|
101 if (auto Sym = OptimizeLayer.findSymbol(Name, false))
|
|
102 return Sym;
|
|
103 return JITSymbol(nullptr);
|
|
104 },
|
|
105 // ...
|
|
106
|
|
107 .. code-block:: c++
|
|
108
|
|
109 // ...
|
121
|
110 return cantFail(OptimizeLayer.addModule(std::move(M),
|
|
111 std::move(Resolver)));
|
120
|
112 // ...
|
|
113
|
|
114 .. code-block:: c++
|
|
115
|
|
116 // ...
|
|
117 return OptimizeLayer.findSymbol(MangledNameStream.str(), true);
|
|
118 // ...
|
|
119
|
|
120 .. code-block:: c++
|
|
121
|
|
122 // ...
|
121
|
123 cantFail(OptimizeLayer.removeModule(H));
|
120
|
124 // ...
|
|
125
|
|
126 Next we need to replace references to 'CompileLayer' with references to
|
|
127 OptimizeLayer in our key methods: addModule, findSymbol, and removeModule. In
|
|
128 addModule we need to be careful to replace both references: the findSymbol call
|
121
|
129 inside our resolver, and the call through to addModule.
|
120
|
130
|
|
131 .. code-block:: c++
|
|
132
|
121
|
133 std::shared_ptr<Module> optimizeModule(std::shared_ptr<Module> M) {
|
120
|
134 // Create a function pass manager.
|
|
135 auto FPM = llvm::make_unique<legacy::FunctionPassManager>(M.get());
|
|
136
|
|
137 // Add some optimizations.
|
|
138 FPM->add(createInstructionCombiningPass());
|
|
139 FPM->add(createReassociatePass());
|
|
140 FPM->add(createGVNPass());
|
|
141 FPM->add(createCFGSimplificationPass());
|
|
142 FPM->doInitialization();
|
|
143
|
|
144 // Run the optimizations over all functions in the module being added to
|
|
145 // the JIT.
|
|
146 for (auto &F : *M)
|
|
147 FPM->run(F);
|
|
148
|
|
149 return M;
|
|
150 }
|
|
151
|
|
152 At the bottom of our JIT we add a private method to do the actual optimization:
|
|
153 *optimizeModule*. This function sets up a FunctionPassManager, adds some passes
|
|
154 to it, runs it over every function in the module, and then returns the mutated
|
|
155 module. The specific optimizations are the same ones used in
|
121
|
156 `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM"
|
120
|
157 tutorial series. Readers may visit that chapter for a more in-depth
|
|
158 discussion of these, and of IR optimization in general.
|
|
159
|
|
160 And that's it in terms of changes to KaleidoscopeJIT: When a module is added via
|
|
161 addModule the OptimizeLayer will call our optimizeModule function before passing
|
|
162 the transformed module on to the CompileLayer below. Of course, we could have
|
|
163 called optimizeModule directly in our addModule function and not gone to the
|
|
164 bother of using the IRTransformLayer, but doing so gives us another opportunity
|
|
165 to see how layers compose. It also provides a neat entry point to the *layer*
|
|
166 concept itself, because IRTransformLayer turns out to be one of the simplest
|
|
167 implementations of the layer concept that can be devised:
|
|
168
|
|
169 .. code-block:: c++
|
|
170
|
|
171 template <typename BaseLayerT, typename TransformFtor>
|
|
172 class IRTransformLayer {
|
|
173 public:
|
121
|
174 using ModuleHandleT = typename BaseLayerT::ModuleHandleT;
|
120
|
175
|
|
176 IRTransformLayer(BaseLayerT &BaseLayer,
|
|
177 TransformFtor Transform = TransformFtor())
|
|
178 : BaseLayer(BaseLayer), Transform(std::move(Transform)) {}
|
|
179
|
121
|
180 Expected<ModuleHandleT>
|
|
181 addModule(std::shared_ptr<Module> M,
|
|
182 std::shared_ptr<JITSymbolResolver> Resolver) {
|
|
183 return BaseLayer.addModule(Transform(std::move(M)), std::move(Resolver));
|
120
|
184 }
|
|
185
|
121
|
186 void removeModule(ModuleHandleT H) { BaseLayer.removeModule(H); }
|
120
|
187
|
|
188 JITSymbol findSymbol(const std::string &Name, bool ExportedSymbolsOnly) {
|
|
189 return BaseLayer.findSymbol(Name, ExportedSymbolsOnly);
|
|
190 }
|
|
191
|
121
|
192 JITSymbol findSymbolIn(ModuleHandleT H, const std::string &Name,
|
120
|
193 bool ExportedSymbolsOnly) {
|
|
194 return BaseLayer.findSymbolIn(H, Name, ExportedSymbolsOnly);
|
|
195 }
|
|
196
|
121
|
197 void emitAndFinalize(ModuleHandleT H) {
|
120
|
198 BaseLayer.emitAndFinalize(H);
|
|
199 }
|
|
200
|
|
201 TransformFtor& getTransform() { return Transform; }
|
|
202
|
|
203 const TransformFtor& getTransform() const { return Transform; }
|
|
204
|
|
205 private:
|
|
206 BaseLayerT &BaseLayer;
|
|
207 TransformFtor Transform;
|
|
208 };
|
|
209
|
|
210 This is the whole definition of IRTransformLayer, from
|
|
211 ``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h``, stripped of its
|
|
212 comments. It is a template class with two template arguments: ``BaesLayerT`` and
|
|
213 ``TransformFtor`` that provide the type of the base layer and the type of the
|
|
214 "transform functor" (in our case a std::function) respectively. This class is
|
|
215 concerned with two very simple jobs: (1) Running every IR Module that is added
|
121
|
216 with addModule through the transform functor, and (2) conforming to the ORC
|
120
|
217 layer interface. The interface consists of one typedef and five methods:
|
|
218
|
|
219 +------------------+-----------------------------------------------------------+
|
|
220 | Interface | Description |
|
|
221 +==================+===========================================================+
|
|
222 | | Provides a handle that can be used to identify a module |
|
121
|
223 | ModuleHandleT | set when calling findSymbolIn, removeModule, or |
|
120
|
224 | | emitAndFinalize. |
|
|
225 +------------------+-----------------------------------------------------------+
|
|
226 | | Takes a given set of Modules and makes them "available |
|
|
227 | | for execution. This means that symbols in those modules |
|
|
228 | | should be searchable via findSymbol and findSymbolIn, and |
|
|
229 | | the address of the symbols should be read/writable (for |
|
|
230 | | data symbols), or executable (for function symbols) after |
|
|
231 | | JITSymbol::getAddress() is called. Note: This means that |
|
121
|
232 | addModule | addModule doesn't have to compile (or do any other |
|
120
|
233 | | work) up-front. It *can*, like IRCompileLayer, act |
|
|
234 | | eagerly, but it can also simply record the module and |
|
|
235 | | take no further action until somebody calls |
|
|
236 | | JITSymbol::getAddress(). In IRTransformLayer's case |
|
121
|
237 | | addModule eagerly applies the transform functor to |
|
120
|
238 | | each module in the set, then passes the resulting set |
|
|
239 | | of mutated modules down to the layer below. |
|
|
240 +------------------+-----------------------------------------------------------+
|
|
241 | | Removes a set of modules from the JIT. Code or data |
|
121
|
242 | removeModule | defined in these modules will no longer be available, and |
|
120
|
243 | | the memory holding the JIT'd definitions will be freed. |
|
|
244 +------------------+-----------------------------------------------------------+
|
|
245 | | Searches for the named symbol in all modules that have |
|
121
|
246 | | previously been added via addModule (and not yet |
|
|
247 | findSymbol | removed by a call to removeModule). In |
|
120
|
248 | | IRTransformLayer we just pass the query on to the layer |
|
|
249 | | below. In our REPL this is our default way to search for |
|
|
250 | | function definitions. |
|
|
251 +------------------+-----------------------------------------------------------+
|
|
252 | | Searches for the named symbol in the module set indicated |
|
121
|
253 | | by the given ModuleHandleT. This is just an optimized |
|
120
|
254 | | search, better for lookup-speed when you know exactly |
|
|
255 | | a symbol definition should be found. In IRTransformLayer |
|
|
256 | findSymbolIn | we just pass this query on to the layer below. In our |
|
|
257 | | REPL we use this method to search for functions |
|
|
258 | | representing top-level expressions, since we know exactly |
|
|
259 | | where we'll find them: in the top-level expression module |
|
|
260 | | we just added. |
|
|
261 +------------------+-----------------------------------------------------------+
|
|
262 | | Forces all of the actions required to make the code and |
|
121
|
263 | | data in a module set (represented by a ModuleHandleT) |
|
120
|
264 | | accessible. Behaves as if some symbol in the set had been |
|
|
265 | | searched for and JITSymbol::getSymbolAddress called. This |
|
|
266 | emitAndFinalize | is rarely needed, but can be useful when dealing with |
|
|
267 | | layers that usually behave lazily if the user wants to |
|
|
268 | | trigger early compilation (for example, to use idle CPU |
|
|
269 | | time to eagerly compile code in the background). |
|
|
270 +------------------+-----------------------------------------------------------+
|
|
271
|
|
272 This interface attempts to capture the natural operations of a JIT (with some
|
|
273 wrinkles like emitAndFinalize for performance), similar to the basic JIT API
|
|
274 operations we identified in Chapter 1. Conforming to the layer concept allows
|
|
275 classes to compose neatly by implementing their behaviors in terms of the these
|
|
276 same operations, carried out on the layer below. For example, an eager layer
|
121
|
277 (like IRTransformLayer) can implement addModule by running each module in the
|
120
|
278 set through its transform up-front and immediately passing the result to the
|
121
|
279 layer below. A lazy layer, by contrast, could implement addModule by
|
120
|
280 squirreling away the modules doing no other up-front work, but applying the
|
121
|
281 transform (and calling addModule on the layer below) when the client calls
|
120
|
282 findSymbol instead. The JIT'd program behavior will be the same either way, but
|
|
283 these choices will have different performance characteristics: Doing work
|
|
284 eagerly means the JIT takes longer up-front, but proceeds smoothly once this is
|
|
285 done. Deferring work allows the JIT to get up-and-running quickly, but will
|
|
286 force the JIT to pause and wait whenever some code or data is needed that hasn't
|
|
287 already been processed.
|
|
288
|
|
289 Our current REPL is eager: Each function definition is optimized and compiled as
|
|
290 soon as it's typed in. If we were to make the transform layer lazy (but not
|
|
291 change things otherwise) we could defer optimization until the first time we
|
|
292 reference a function in a top-level expression (see if you can figure out why,
|
|
293 then check out the answer below [1]_). In the next chapter, however we'll
|
|
294 introduce fully lazy compilation, in which function's aren't compiled until
|
|
295 they're first called at run-time. At this point the trade-offs get much more
|
|
296 interesting: the lazier we are, the quicker we can start executing the first
|
|
297 function, but the more often we'll have to pause to compile newly encountered
|
|
298 functions. If we only code-gen lazily, but optimize eagerly, we'll have a slow
|
|
299 startup (which everything is optimized) but relatively short pauses as each
|
|
300 function just passes through code-gen. If we both optimize and code-gen lazily
|
|
301 we can start executing the first function more quickly, but we'll have longer
|
|
302 pauses as each function has to be both optimized and code-gen'd when it's first
|
|
303 executed. Things become even more interesting if we consider interproceedural
|
|
304 optimizations like inlining, which must be performed eagerly. These are
|
|
305 complex trade-offs, and there is no one-size-fits all solution to them, but by
|
|
306 providing composable layers we leave the decisions to the person implementing
|
|
307 the JIT, and make it easy for them to experiment with different configurations.
|
|
308
|
|
309 `Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_
|
|
310
|
|
311 Full Code Listing
|
|
312 =================
|
|
313
|
|
314 Here is the complete code listing for our running example with an
|
|
315 IRTransformLayer added to enable optimization. To build this example, use:
|
|
316
|
|
317 .. code-block:: bash
|
|
318
|
|
319 # Compile
|
121
|
320 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
|
120
|
321 # Run
|
|
322 ./toy
|
|
323
|
|
324 Here is the code:
|
|
325
|
|
326 .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h
|
|
327 :language: c++
|
|
328
|
|
329 .. [1] When we add our top-level expression to the JIT, any calls to functions
|
121
|
330 that we defined earlier will appear to the RTDyldObjectLinkingLayer as
|
|
331 external symbols. The RTDyldObjectLinkingLayer will call the SymbolResolver
|
|
332 that we defined in addModule, which in turn calls findSymbol on the
|
120
|
333 OptimizeLayer, at which point even a lazy transform layer will have to
|
|
334 do its work.
|