120
|
1 =====================================================================
|
|
2 Building a JIT: Adding Optimizations -- An introduction to ORC Layers
|
|
3 =====================================================================
|
|
4
|
|
5 .. contents::
|
|
6 :local:
|
|
7
|
|
8 **This tutorial is under active development. It is incomplete and details may
|
|
9 change frequently.** Nonetheless we invite you to try it out as it stands, and
|
|
10 we welcome any feedback.
|
|
11
|
|
12 Chapter 2 Introduction
|
|
13 ======================
|
|
14
|
|
15 Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In
|
|
16 `Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT
|
|
17 class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce
|
|
18 executable code in memory. KaleidoscopeJIT was able to do this with relatively
|
|
19 little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and
|
|
20 ObjectLinkingLayer, to do much of the heavy lifting.
|
|
21
|
|
22 In this layer we'll learn more about the ORC layer concept by using a new layer,
|
|
23 IRTransformLayer, to add IR optimization support to KaleidoscopeJIT.
|
|
24
|
|
25 Optimizing Modules using the IRTransformLayer
|
|
26 =============================================
|
|
27
|
121
|
28 In `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM"
|
120
|
29 tutorial series the llvm *FunctionPassManager* is introduced as a means for
|
|
30 optimizing LLVM IR. Interested readers may read that chapter for details, but
|
|
31 in short: to optimize a Module we create an llvm::FunctionPassManager
|
|
32 instance, configure it with a set of optimizations, then run the PassManager on
|
|
33 a Module to mutate it into a (hopefully) more optimized but semantically
|
|
34 equivalent form. In the original tutorial series the FunctionPassManager was
|
|
35 created outside the KaleidoscopeJIT and modules were optimized before being
|
|
36 added to it. In this Chapter we will make optimization a phase of our JIT
|
|
37 instead. For now this will provide us a motivation to learn more about ORC
|
|
38 layers, but in the long term making optimization part of our JIT will yield an
|
|
39 important benefit: When we begin lazily compiling code (i.e. deferring
|
|
40 compilation of each function until the first time it's run), having
|
|
41 optimization managed by our JIT will allow us to optimize lazily too, rather
|
|
42 than having to do all our optimization up-front.
|
|
43
|
|
44 To add optimization support to our JIT we will take the KaleidoscopeJIT from
|
|
45 Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the
|
|
46 IRTransformLayer works in more detail below, but the interface is simple: the
|
|
47 constructor for this layer takes a reference to the layer below (as all layers
|
|
48 do) plus an *IR optimization function* that it will apply to each Module that
|
121
|
49 is added via addModule:
|
120
|
50
|
|
51 .. code-block:: c++
|
|
52
|
|
53 class KaleidoscopeJIT {
|
|
54 private:
|
|
55 std::unique_ptr<TargetMachine> TM;
|
|
56 const DataLayout DL;
|
121
|
57 RTDyldObjectLinkingLayer<> ObjectLayer;
|
120
|
58 IRCompileLayer<decltype(ObjectLayer)> CompileLayer;
|
|
59
|
121
|
60 using OptimizeFunction =
|
|
61 std::function<std::shared_ptr<Module>(std::shared_ptr<Module>)>;
|
120
|
62
|
|
63 IRTransformLayer<decltype(CompileLayer), OptimizeFunction> OptimizeLayer;
|
|
64
|
|
65 public:
|
121
|
66 using ModuleHandle = decltype(OptimizeLayer)::ModuleHandleT;
|
120
|
67
|
|
68 KaleidoscopeJIT()
|
|
69 : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()),
|
121
|
70 ObjectLayer([]() { return std::make_shared<SectionMemoryManager>(); }),
|
120
|
71 CompileLayer(ObjectLayer, SimpleCompiler(*TM)),
|
|
72 OptimizeLayer(CompileLayer,
|
|
73 [this](std::unique_ptr<Module> M) {
|
|
74 return optimizeModule(std::move(M));
|
|
75 }) {
|
|
76 llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr);
|
|
77 }
|
|
78
|
|
79 Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1,
|
|
80 but after the CompileLayer we introduce a typedef for our optimization function.
|
|
81 In this case we use a std::function (a handy wrapper for "function-like" things)
|
|
82 from a single unique_ptr<Module> input to a std::unique_ptr<Module> output. With
|
|
83 our optimization function typedef in place we can declare our OptimizeLayer,
|
|
84 which sits on top of our CompileLayer.
|
|
85
|
|
86 To initialize our OptimizeLayer we pass it a reference to the CompileLayer
|
|
87 below (standard practice for layers), and we initialize the OptimizeFunction
|
|
88 using a lambda that calls out to an "optimizeModule" function that we will
|
|
89 define below.
|
|
90
|
|
91 .. code-block:: c++
|
|
92
|
|
93 // ...
|
|
94 auto Resolver = createLambdaResolver(
|
|
95 [&](const std::string &Name) {
|
|
96 if (auto Sym = OptimizeLayer.findSymbol(Name, false))
|
|
97 return Sym;
|
|
98 return JITSymbol(nullptr);
|
|
99 },
|
|
100 // ...
|
|
101
|
|
102 .. code-block:: c++
|
|
103
|
|
104 // ...
|
121
|
105 return cantFail(OptimizeLayer.addModule(std::move(M),
|
|
106 std::move(Resolver)));
|
120
|
107 // ...
|
|
108
|
|
109 .. code-block:: c++
|
|
110
|
|
111 // ...
|
|
112 return OptimizeLayer.findSymbol(MangledNameStream.str(), true);
|
|
113 // ...
|
|
114
|
|
115 .. code-block:: c++
|
|
116
|
|
117 // ...
|
121
|
118 cantFail(OptimizeLayer.removeModule(H));
|
120
|
119 // ...
|
|
120
|
|
121 Next we need to replace references to 'CompileLayer' with references to
|
|
122 OptimizeLayer in our key methods: addModule, findSymbol, and removeModule. In
|
|
123 addModule we need to be careful to replace both references: the findSymbol call
|
121
|
124 inside our resolver, and the call through to addModule.
|
120
|
125
|
|
126 .. code-block:: c++
|
|
127
|
121
|
128 std::shared_ptr<Module> optimizeModule(std::shared_ptr<Module> M) {
|
120
|
129 // Create a function pass manager.
|
|
130 auto FPM = llvm::make_unique<legacy::FunctionPassManager>(M.get());
|
|
131
|
|
132 // Add some optimizations.
|
|
133 FPM->add(createInstructionCombiningPass());
|
|
134 FPM->add(createReassociatePass());
|
|
135 FPM->add(createGVNPass());
|
|
136 FPM->add(createCFGSimplificationPass());
|
|
137 FPM->doInitialization();
|
|
138
|
|
139 // Run the optimizations over all functions in the module being added to
|
|
140 // the JIT.
|
|
141 for (auto &F : *M)
|
|
142 FPM->run(F);
|
|
143
|
|
144 return M;
|
|
145 }
|
|
146
|
|
147 At the bottom of our JIT we add a private method to do the actual optimization:
|
|
148 *optimizeModule*. This function sets up a FunctionPassManager, adds some passes
|
|
149 to it, runs it over every function in the module, and then returns the mutated
|
|
150 module. The specific optimizations are the same ones used in
|
121
|
151 `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM"
|
120
|
152 tutorial series. Readers may visit that chapter for a more in-depth
|
|
153 discussion of these, and of IR optimization in general.
|
|
154
|
|
155 And that's it in terms of changes to KaleidoscopeJIT: When a module is added via
|
|
156 addModule the OptimizeLayer will call our optimizeModule function before passing
|
|
157 the transformed module on to the CompileLayer below. Of course, we could have
|
|
158 called optimizeModule directly in our addModule function and not gone to the
|
|
159 bother of using the IRTransformLayer, but doing so gives us another opportunity
|
|
160 to see how layers compose. It also provides a neat entry point to the *layer*
|
|
161 concept itself, because IRTransformLayer turns out to be one of the simplest
|
|
162 implementations of the layer concept that can be devised:
|
|
163
|
|
164 .. code-block:: c++
|
|
165
|
|
166 template <typename BaseLayerT, typename TransformFtor>
|
|
167 class IRTransformLayer {
|
|
168 public:
|
121
|
169 using ModuleHandleT = typename BaseLayerT::ModuleHandleT;
|
120
|
170
|
|
171 IRTransformLayer(BaseLayerT &BaseLayer,
|
|
172 TransformFtor Transform = TransformFtor())
|
|
173 : BaseLayer(BaseLayer), Transform(std::move(Transform)) {}
|
|
174
|
121
|
175 Expected<ModuleHandleT>
|
|
176 addModule(std::shared_ptr<Module> M,
|
|
177 std::shared_ptr<JITSymbolResolver> Resolver) {
|
|
178 return BaseLayer.addModule(Transform(std::move(M)), std::move(Resolver));
|
120
|
179 }
|
|
180
|
121
|
181 void removeModule(ModuleHandleT H) { BaseLayer.removeModule(H); }
|
120
|
182
|
|
183 JITSymbol findSymbol(const std::string &Name, bool ExportedSymbolsOnly) {
|
|
184 return BaseLayer.findSymbol(Name, ExportedSymbolsOnly);
|
|
185 }
|
|
186
|
121
|
187 JITSymbol findSymbolIn(ModuleHandleT H, const std::string &Name,
|
120
|
188 bool ExportedSymbolsOnly) {
|
|
189 return BaseLayer.findSymbolIn(H, Name, ExportedSymbolsOnly);
|
|
190 }
|
|
191
|
121
|
192 void emitAndFinalize(ModuleHandleT H) {
|
120
|
193 BaseLayer.emitAndFinalize(H);
|
|
194 }
|
|
195
|
|
196 TransformFtor& getTransform() { return Transform; }
|
|
197
|
|
198 const TransformFtor& getTransform() const { return Transform; }
|
|
199
|
|
200 private:
|
|
201 BaseLayerT &BaseLayer;
|
|
202 TransformFtor Transform;
|
|
203 };
|
|
204
|
|
205 This is the whole definition of IRTransformLayer, from
|
|
206 ``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h``, stripped of its
|
|
207 comments. It is a template class with two template arguments: ``BaesLayerT`` and
|
|
208 ``TransformFtor`` that provide the type of the base layer and the type of the
|
|
209 "transform functor" (in our case a std::function) respectively. This class is
|
|
210 concerned with two very simple jobs: (1) Running every IR Module that is added
|
121
|
211 with addModule through the transform functor, and (2) conforming to the ORC
|
120
|
212 layer interface. The interface consists of one typedef and five methods:
|
|
213
|
|
214 +------------------+-----------------------------------------------------------+
|
|
215 | Interface | Description |
|
|
216 +==================+===========================================================+
|
|
217 | | Provides a handle that can be used to identify a module |
|
121
|
218 | ModuleHandleT | set when calling findSymbolIn, removeModule, or |
|
120
|
219 | | emitAndFinalize. |
|
|
220 +------------------+-----------------------------------------------------------+
|
|
221 | | Takes a given set of Modules and makes them "available |
|
|
222 | | for execution. This means that symbols in those modules |
|
|
223 | | should be searchable via findSymbol and findSymbolIn, and |
|
|
224 | | the address of the symbols should be read/writable (for |
|
|
225 | | data symbols), or executable (for function symbols) after |
|
|
226 | | JITSymbol::getAddress() is called. Note: This means that |
|
121
|
227 | addModule | addModule doesn't have to compile (or do any other |
|
120
|
228 | | work) up-front. It *can*, like IRCompileLayer, act |
|
|
229 | | eagerly, but it can also simply record the module and |
|
|
230 | | take no further action until somebody calls |
|
|
231 | | JITSymbol::getAddress(). In IRTransformLayer's case |
|
121
|
232 | | addModule eagerly applies the transform functor to |
|
120
|
233 | | each module in the set, then passes the resulting set |
|
|
234 | | of mutated modules down to the layer below. |
|
|
235 +------------------+-----------------------------------------------------------+
|
|
236 | | Removes a set of modules from the JIT. Code or data |
|
121
|
237 | removeModule | defined in these modules will no longer be available, and |
|
120
|
238 | | the memory holding the JIT'd definitions will be freed. |
|
|
239 +------------------+-----------------------------------------------------------+
|
|
240 | | Searches for the named symbol in all modules that have |
|
121
|
241 | | previously been added via addModule (and not yet |
|
|
242 | findSymbol | removed by a call to removeModule). In |
|
120
|
243 | | IRTransformLayer we just pass the query on to the layer |
|
|
244 | | below. In our REPL this is our default way to search for |
|
|
245 | | function definitions. |
|
|
246 +------------------+-----------------------------------------------------------+
|
|
247 | | Searches for the named symbol in the module set indicated |
|
121
|
248 | | by the given ModuleHandleT. This is just an optimized |
|
120
|
249 | | search, better for lookup-speed when you know exactly |
|
|
250 | | a symbol definition should be found. In IRTransformLayer |
|
|
251 | findSymbolIn | we just pass this query on to the layer below. In our |
|
|
252 | | REPL we use this method to search for functions |
|
|
253 | | representing top-level expressions, since we know exactly |
|
|
254 | | where we'll find them: in the top-level expression module |
|
|
255 | | we just added. |
|
|
256 +------------------+-----------------------------------------------------------+
|
|
257 | | Forces all of the actions required to make the code and |
|
121
|
258 | | data in a module set (represented by a ModuleHandleT) |
|
120
|
259 | | accessible. Behaves as if some symbol in the set had been |
|
|
260 | | searched for and JITSymbol::getSymbolAddress called. This |
|
|
261 | emitAndFinalize | is rarely needed, but can be useful when dealing with |
|
|
262 | | layers that usually behave lazily if the user wants to |
|
|
263 | | trigger early compilation (for example, to use idle CPU |
|
|
264 | | time to eagerly compile code in the background). |
|
|
265 +------------------+-----------------------------------------------------------+
|
|
266
|
|
267 This interface attempts to capture the natural operations of a JIT (with some
|
|
268 wrinkles like emitAndFinalize for performance), similar to the basic JIT API
|
|
269 operations we identified in Chapter 1. Conforming to the layer concept allows
|
|
270 classes to compose neatly by implementing their behaviors in terms of the these
|
|
271 same operations, carried out on the layer below. For example, an eager layer
|
121
|
272 (like IRTransformLayer) can implement addModule by running each module in the
|
120
|
273 set through its transform up-front and immediately passing the result to the
|
121
|
274 layer below. A lazy layer, by contrast, could implement addModule by
|
120
|
275 squirreling away the modules doing no other up-front work, but applying the
|
121
|
276 transform (and calling addModule on the layer below) when the client calls
|
120
|
277 findSymbol instead. The JIT'd program behavior will be the same either way, but
|
|
278 these choices will have different performance characteristics: Doing work
|
|
279 eagerly means the JIT takes longer up-front, but proceeds smoothly once this is
|
|
280 done. Deferring work allows the JIT to get up-and-running quickly, but will
|
|
281 force the JIT to pause and wait whenever some code or data is needed that hasn't
|
|
282 already been processed.
|
|
283
|
|
284 Our current REPL is eager: Each function definition is optimized and compiled as
|
|
285 soon as it's typed in. If we were to make the transform layer lazy (but not
|
|
286 change things otherwise) we could defer optimization until the first time we
|
|
287 reference a function in a top-level expression (see if you can figure out why,
|
|
288 then check out the answer below [1]_). In the next chapter, however we'll
|
|
289 introduce fully lazy compilation, in which function's aren't compiled until
|
|
290 they're first called at run-time. At this point the trade-offs get much more
|
|
291 interesting: the lazier we are, the quicker we can start executing the first
|
|
292 function, but the more often we'll have to pause to compile newly encountered
|
|
293 functions. If we only code-gen lazily, but optimize eagerly, we'll have a slow
|
|
294 startup (which everything is optimized) but relatively short pauses as each
|
|
295 function just passes through code-gen. If we both optimize and code-gen lazily
|
|
296 we can start executing the first function more quickly, but we'll have longer
|
|
297 pauses as each function has to be both optimized and code-gen'd when it's first
|
|
298 executed. Things become even more interesting if we consider interproceedural
|
|
299 optimizations like inlining, which must be performed eagerly. These are
|
|
300 complex trade-offs, and there is no one-size-fits all solution to them, but by
|
|
301 providing composable layers we leave the decisions to the person implementing
|
|
302 the JIT, and make it easy for them to experiment with different configurations.
|
|
303
|
|
304 `Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_
|
|
305
|
|
306 Full Code Listing
|
|
307 =================
|
|
308
|
|
309 Here is the complete code listing for our running example with an
|
|
310 IRTransformLayer added to enable optimization. To build this example, use:
|
|
311
|
|
312 .. code-block:: bash
|
|
313
|
|
314 # Compile
|
121
|
315 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
|
120
|
316 # Run
|
|
317 ./toy
|
|
318
|
|
319 Here is the code:
|
|
320
|
|
321 .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h
|
|
322 :language: c++
|
|
323
|
|
324 .. [1] When we add our top-level expression to the JIT, any calls to functions
|
121
|
325 that we defined earlier will appear to the RTDyldObjectLinkingLayer as
|
|
326 external symbols. The RTDyldObjectLinkingLayer will call the SymbolResolver
|
|
327 that we defined in addModule, which in turn calls findSymbol on the
|
120
|
328 OptimizeLayer, at which point even a lazy transform layer will have to
|
|
329 do its work.
|