150
|
1 =====================================================================
|
|
2 Building a JIT: Adding Optimizations -- An introduction to ORC Layers
|
|
3 =====================================================================
|
|
4
|
|
5 .. contents::
|
|
6 :local:
|
|
7
|
|
8 **This tutorial is under active development. It is incomplete and details may
|
|
9 change frequently.** Nonetheless we invite you to try it out as it stands, and
|
|
10 we welcome any feedback.
|
|
11
|
|
12 Chapter 2 Introduction
|
|
13 ======================
|
|
14
|
|
15 **Warning: This tutorial is currently being updated to account for ORC API
|
|
16 changes. Only Chapters 1 and 2 are up-to-date.**
|
|
17
|
|
18 **Example code from Chapters 3 to 5 will compile and run, but has not been
|
|
19 updated**
|
|
20
|
|
21 Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In
|
|
22 `Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT
|
|
23 class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce
|
|
24 executable code in memory. KaleidoscopeJIT was able to do this with relatively
|
|
25 little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and
|
|
26 ObjectLinkingLayer, to do much of the heavy lifting.
|
|
27
|
|
28 In this layer we'll learn more about the ORC layer concept by using a new layer,
|
|
29 IRTransformLayer, to add IR optimization support to KaleidoscopeJIT.
|
|
30
|
|
31 Optimizing Modules using the IRTransformLayer
|
|
32 =============================================
|
|
33
|
|
34 In `Chapter 4 <LangImpl04.html>`_ of the "Implementing a language with LLVM"
|
|
35 tutorial series the llvm *FunctionPassManager* is introduced as a means for
|
|
36 optimizing LLVM IR. Interested readers may read that chapter for details, but
|
|
37 in short: to optimize a Module we create an llvm::FunctionPassManager
|
|
38 instance, configure it with a set of optimizations, then run the PassManager on
|
|
39 a Module to mutate it into a (hopefully) more optimized but semantically
|
|
40 equivalent form. In the original tutorial series the FunctionPassManager was
|
|
41 created outside the KaleidoscopeJIT and modules were optimized before being
|
|
42 added to it. In this Chapter we will make optimization a phase of our JIT
|
|
43 instead. For now this will provide us a motivation to learn more about ORC
|
|
44 layers, but in the long term making optimization part of our JIT will yield an
|
|
45 important benefit: When we begin lazily compiling code (i.e. deferring
|
|
46 compilation of each function until the first time it's run) having
|
|
47 optimization managed by our JIT will allow us to optimize lazily too, rather
|
|
48 than having to do all our optimization up-front.
|
|
49
|
|
50 To add optimization support to our JIT we will take the KaleidoscopeJIT from
|
|
51 Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the
|
|
52 IRTransformLayer works in more detail below, but the interface is simple: the
|
|
53 constructor for this layer takes a reference to the execution session and the
|
|
54 layer below (as all layers do) plus an *IR optimization function* that it will
|
|
55 apply to each Module that is added via addModule:
|
|
56
|
|
57 .. code-block:: c++
|
|
58
|
|
59 class KaleidoscopeJIT {
|
|
60 private:
|
|
61 ExecutionSession ES;
|
|
62 RTDyldObjectLinkingLayer ObjectLayer;
|
|
63 IRCompileLayer CompileLayer;
|
|
64 IRTransformLayer TransformLayer;
|
|
65
|
|
66 DataLayout DL;
|
|
67 MangleAndInterner Mangle;
|
|
68 ThreadSafeContext Ctx;
|
|
69
|
|
70 public:
|
|
71
|
|
72 KaleidoscopeJIT(JITTargetMachineBuilder JTMB, DataLayout DL)
|
|
73 : ObjectLayer(ES,
|
|
74 []() { return std::make_unique<SectionMemoryManager>(); }),
|
|
75 CompileLayer(ES, ObjectLayer, ConcurrentIRCompiler(std::move(JTMB))),
|
|
76 TransformLayer(ES, CompileLayer, optimizeModule),
|
|
77 DL(std::move(DL)), Mangle(ES, this->DL),
|
|
78 Ctx(std::make_unique<LLVMContext>()) {
|
221
|
79 ES.getMainJITDylib().addGenerator(
|
|
80 cantFail(DynamicLibrarySearchGenerator::GetForCurrentProcess(DL.getGlobalPrefix())));
|
150
|
81 }
|
|
82
|
|
83 Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1,
|
|
84 but after the CompileLayer we introduce a new member, TransformLayer, which sits
|
|
85 on top of our CompileLayer. We initialize our OptimizeLayer with a reference to
|
|
86 the ExecutionSession and output layer (standard practice for layers), along with
|
|
87 a *transform function*. For our transform function we supply our classes
|
|
88 optimizeModule static method.
|
|
89
|
|
90 .. code-block:: c++
|
|
91
|
|
92 // ...
|
|
93 return cantFail(OptimizeLayer.addModule(std::move(M),
|
|
94 std::move(Resolver)));
|
|
95 // ...
|
|
96
|
|
97 Next we need to update our addModule method to replace the call to
|
|
98 ``CompileLayer::add`` with a call to ``OptimizeLayer::add`` instead.
|
|
99
|
|
100 .. code-block:: c++
|
|
101
|
|
102 static Expected<ThreadSafeModule>
|
|
103 optimizeModule(ThreadSafeModule M, const MaterializationResponsibility &R) {
|
|
104 // Create a function pass manager.
|
|
105 auto FPM = std::make_unique<legacy::FunctionPassManager>(M.get());
|
|
106
|
|
107 // Add some optimizations.
|
|
108 FPM->add(createInstructionCombiningPass());
|
|
109 FPM->add(createReassociatePass());
|
|
110 FPM->add(createGVNPass());
|
|
111 FPM->add(createCFGSimplificationPass());
|
|
112 FPM->doInitialization();
|
|
113
|
|
114 // Run the optimizations over all functions in the module being added to
|
|
115 // the JIT.
|
|
116 for (auto &F : *M)
|
|
117 FPM->run(F);
|
|
118
|
|
119 return M;
|
|
120 }
|
|
121
|
|
122 At the bottom of our JIT we add a private method to do the actual optimization:
|
|
123 *optimizeModule*. This function takes the module to be transformed as input (as
|
|
124 a ThreadSafeModule) along with a reference to a reference to a new class:
|
|
125 ``MaterializationResponsibility``. The MaterializationResponsibility argument
|
|
126 can be used to query JIT state for the module being transformed, such as the set
|
|
127 of definitions in the module that JIT'd code is actively trying to call/access.
|
|
128 For now we will ignore this argument and use a standard optimization
|
|
129 pipeline. To do this we set up a FunctionPassManager, add some passes to it, run
|
|
130 it over every function in the module, and then return the mutated module. The
|
|
131 specific optimizations are the same ones used in `Chapter 4 <LangImpl04.html>`_
|
|
132 of the "Implementing a language with LLVM" tutorial series. Readers may visit
|
|
133 that chapter for a more in-depth discussion of these, and of IR optimization in
|
|
134 general.
|
|
135
|
|
136 And that's it in terms of changes to KaleidoscopeJIT: When a module is added via
|
|
137 addModule the OptimizeLayer will call our optimizeModule function before passing
|
|
138 the transformed module on to the CompileLayer below. Of course, we could have
|
|
139 called optimizeModule directly in our addModule function and not gone to the
|
|
140 bother of using the IRTransformLayer, but doing so gives us another opportunity
|
|
141 to see how layers compose. It also provides a neat entry point to the *layer*
|
|
142 concept itself, because IRTransformLayer is one of the simplest layers that
|
|
143 can be implemented.
|
|
144
|
|
145 .. code-block:: c++
|
|
146
|
|
147 // From IRTransformLayer.h:
|
|
148 class IRTransformLayer : public IRLayer {
|
|
149 public:
|
|
150 using TransformFunction = std::function<Expected<ThreadSafeModule>(
|
|
151 ThreadSafeModule, const MaterializationResponsibility &R)>;
|
|
152
|
|
153 IRTransformLayer(ExecutionSession &ES, IRLayer &BaseLayer,
|
|
154 TransformFunction Transform = identityTransform);
|
|
155
|
|
156 void setTransform(TransformFunction Transform) {
|
|
157 this->Transform = std::move(Transform);
|
|
158 }
|
|
159
|
|
160 static ThreadSafeModule
|
|
161 identityTransform(ThreadSafeModule TSM,
|
|
162 const MaterializationResponsibility &R) {
|
|
163 return TSM;
|
|
164 }
|
|
165
|
|
166 void emit(MaterializationResponsibility R, ThreadSafeModule TSM) override;
|
|
167
|
|
168 private:
|
|
169 IRLayer &BaseLayer;
|
|
170 TransformFunction Transform;
|
|
171 };
|
|
172
|
|
173 // From IRTransformLayer.cpp:
|
|
174
|
|
175 IRTransformLayer::IRTransformLayer(ExecutionSession &ES,
|
|
176 IRLayer &BaseLayer,
|
|
177 TransformFunction Transform)
|
|
178 : IRLayer(ES), BaseLayer(BaseLayer), Transform(std::move(Transform)) {}
|
|
179
|
|
180 void IRTransformLayer::emit(MaterializationResponsibility R,
|
|
181 ThreadSafeModule TSM) {
|
|
182 assert(TSM.getModule() && "Module must not be null");
|
|
183
|
|
184 if (auto TransformedTSM = Transform(std::move(TSM), R))
|
|
185 BaseLayer.emit(std::move(R), std::move(*TransformedTSM));
|
|
186 else {
|
|
187 R.failMaterialization();
|
|
188 getExecutionSession().reportError(TransformedTSM.takeError());
|
|
189 }
|
|
190 }
|
|
191
|
|
192 This is the whole definition of IRTransformLayer, from
|
|
193 ``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h`` and
|
|
194 ``llvm/lib/ExecutionEngine/Orc/IRTransformLayer.cpp``. This class is concerned
|
|
195 with two very simple jobs: (1) Running every IR Module that is emitted via this
|
|
196 layer through the transform function object, and (2) implementing the ORC
|
|
197 ``IRLayer`` interface (which itself conforms to the general ORC Layer concept,
|
|
198 more on that below). Most of the class is straightforward: a typedef for the
|
|
199 transform function, a constructor to initialize the members, a setter for the
|
|
200 transform function value, and a default no-op transform. The most important
|
|
201 method is ``emit`` as this is half of our IRLayer interface. The emit method
|
|
202 applies our transform to each module that it is called on and, if the transform
|
|
203 succeeds, passes the transformed module to the base layer. If the transform
|
|
204 fails, our emit function calls
|
|
205 ``MaterializationResponsibility::failMaterialization`` (this JIT clients who
|
|
206 may be waiting on other threads know that the code they were waiting for has
|
|
207 failed to compile) and logs the error with the execution session before bailing
|
|
208 out.
|
|
209
|
|
210 The other half of the IRLayer interface we inherit unmodified from the IRLayer
|
|
211 class:
|
|
212
|
|
213 .. code-block:: c++
|
|
214
|
|
215 Error IRLayer::add(JITDylib &JD, ThreadSafeModule TSM, VModuleKey K) {
|
|
216 return JD.define(std::make_unique<BasicIRLayerMaterializationUnit>(
|
|
217 *this, std::move(K), std::move(TSM)));
|
|
218 }
|
|
219
|
|
220 This code, from ``llvm/lib/ExecutionEngine/Orc/Layer.cpp``, adds a
|
|
221 ThreadSafeModule to a given JITDylib by wrapping it up in a
|
|
222 ``MaterializationUnit`` (in this case a ``BasicIRLayerMaterializationUnit``).
|
|
223 Most layers that derived from IRLayer can rely on this default implementation
|
|
224 of the ``add`` method.
|
|
225
|
|
226 These two operations, ``add`` and ``emit``, together constitute the layer
|
223
|
227 concept: A layer is a way to wrap a part of a compiler pipeline (in this case
|
|
228 the "opt" phase of an LLVM compiler) whose API is opaque to ORC with an
|
|
229 interface that ORC can call as needed. The add method takes an
|
150
|
230 module in some input program representation (in this case an LLVM IR module) and
|
|
231 stores it in the target JITDylib, arranging for it to be passed back to the
|
|
232 Layer's emit method when any symbol defined by that module is requested. Layers
|
|
233 can compose neatly by calling the 'emit' method of a base layer to complete
|
|
234 their work. For example, in this tutorial our IRTransformLayer calls through to
|
|
235 our IRCompileLayer to compile the transformed IR, and our IRCompileLayer in turn
|
|
236 calls our ObjectLayer to link the object file produced by our compiler.
|
|
237
|
|
238
|
|
239 So far we have learned how to optimize and compile our LLVM IR, but we have not
|
|
240 focused on when compilation happens. Our current REPL is eager: Each function
|
|
241 definition is optimized and compiled as soon as it is referenced by any other
|
|
242 code, regardless of whether it is ever called at runtime. In the next chapter we
|
|
243 will introduce fully lazy compilation, in which functions are not compiled until
|
|
244 they are first called at run-time. At this point the trade-offs get much more
|
|
245 interesting: the lazier we are, the quicker we can start executing the first
|
|
246 function, but the more often we will have to pause to compile newly encountered
|
|
247 functions. If we only code-gen lazily, but optimize eagerly, we will have a
|
|
248 longer startup time (as everything is optimized) but relatively short pauses as
|
|
249 each function just passes through code-gen. If we both optimize and code-gen
|
|
250 lazily we can start executing the first function more quickly, but we will have
|
|
251 longer pauses as each function has to be both optimized and code-gen'd when it
|
|
252 is first executed. Things become even more interesting if we consider
|
173
|
253 interprocedural optimizations like inlining, which must be performed eagerly.
|
150
|
254 These are complex trade-offs, and there is no one-size-fits all solution to
|
|
255 them, but by providing composable layers we leave the decisions to the person
|
|
256 implementing the JIT, and make it easy for them to experiment with different
|
|
257 configurations.
|
|
258
|
|
259 `Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_
|
|
260
|
|
261 Full Code Listing
|
|
262 =================
|
|
263
|
|
264 Here is the complete code listing for our running example with an
|
|
265 IRTransformLayer added to enable optimization. To build this example, use:
|
|
266
|
|
267 .. code-block:: bash
|
|
268
|
|
269 # Compile
|
|
270 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
|
|
271 # Run
|
|
272 ./toy
|
|
273
|
|
274 Here is the code:
|
|
275
|
|
276 .. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h
|
|
277 :language: c++
|