120
|
1 =====================================
|
|
2 Coroutines in LLVM
|
|
3 =====================================
|
|
4
|
|
5 .. contents::
|
|
6 :local:
|
|
7 :depth: 3
|
|
8
|
|
9 .. warning::
|
|
10 This is a work in progress. Compatibility across LLVM releases is not
|
|
11 guaranteed.
|
|
12
|
|
13 Introduction
|
|
14 ============
|
|
15
|
|
16 .. _coroutine handle:
|
|
17
|
|
18 LLVM coroutines are functions that have one or more `suspend points`_.
|
|
19 When a suspend point is reached, the execution of a coroutine is suspended and
|
|
20 control is returned back to its caller. A suspended coroutine can be resumed
|
|
21 to continue execution from the last suspend point or it can be destroyed.
|
|
22
|
|
23 In the following example, we call function `f` (which may or may not be a
|
|
24 coroutine itself) that returns a handle to a suspended coroutine
|
|
25 (**coroutine handle**) that is used by `main` to resume the coroutine twice and
|
|
26 then destroy it:
|
|
27
|
|
28 .. code-block:: llvm
|
|
29
|
|
30 define i32 @main() {
|
|
31 entry:
|
|
32 %hdl = call i8* @f(i32 4)
|
|
33 call void @llvm.coro.resume(i8* %hdl)
|
|
34 call void @llvm.coro.resume(i8* %hdl)
|
|
35 call void @llvm.coro.destroy(i8* %hdl)
|
|
36 ret i32 0
|
|
37 }
|
|
38
|
|
39 .. _coroutine frame:
|
|
40
|
|
41 In addition to the function stack frame which exists when a coroutine is
|
|
42 executing, there is an additional region of storage that contains objects that
|
|
43 keep the coroutine state when a coroutine is suspended. This region of storage
|
|
44 is called **coroutine frame**. It is created when a coroutine is called and
|
|
45 destroyed when a coroutine runs to completion or destroyed by a call to
|
|
46 the `coro.destroy`_ intrinsic.
|
|
47
|
|
48 An LLVM coroutine is represented as an LLVM function that has calls to
|
|
49 `coroutine intrinsics`_ defining the structure of the coroutine.
|
|
50 After lowering, a coroutine is split into several
|
|
51 functions that represent three different ways of how control can enter the
|
|
52 coroutine:
|
|
53
|
|
54 1. a ramp function, which represents an initial invocation of the coroutine that
|
|
55 creates the coroutine frame and executes the coroutine code until it
|
|
56 encounters a suspend point or reaches the end of the function;
|
|
57
|
|
58 2. a coroutine resume function that is invoked when the coroutine is resumed;
|
|
59
|
|
60 3. a coroutine destroy function that is invoked when the coroutine is destroyed.
|
|
61
|
|
62 .. note:: Splitting out resume and destroy functions are just one of the
|
|
63 possible ways of lowering the coroutine. We chose it for initial
|
|
64 implementation as it matches closely the mental model and results in
|
|
65 reasonably nice code.
|
|
66
|
|
67 Coroutines by Example
|
|
68 =====================
|
|
69
|
|
70 Coroutine Representation
|
|
71 ------------------------
|
|
72
|
|
73 Let's look at an example of an LLVM coroutine with the behavior sketched
|
|
74 by the following pseudo-code.
|
|
75
|
|
76 .. code-block:: c++
|
|
77
|
|
78 void *f(int n) {
|
|
79 for(;;) {
|
|
80 print(n++);
|
|
81 <suspend> // returns a coroutine handle on first suspend
|
|
82 }
|
|
83 }
|
|
84
|
|
85 This coroutine calls some function `print` with value `n` as an argument and
|
|
86 suspends execution. Every time this coroutine resumes, it calls `print` again with an argument one bigger than the last time. This coroutine never completes by itself and must be destroyed explicitly. If we use this coroutine with
|
|
87 a `main` shown in the previous section. It will call `print` with values 4, 5
|
|
88 and 6 after which the coroutine will be destroyed.
|
|
89
|
|
90 The LLVM IR for this coroutine looks like this:
|
|
91
|
121
|
92 .. code-block:: llvm
|
120
|
93
|
|
94 define i8* @f(i32 %n) {
|
|
95 entry:
|
|
96 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
|
|
97 %size = call i32 @llvm.coro.size.i32()
|
|
98 %alloc = call i8* @malloc(i32 %size)
|
|
99 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc)
|
|
100 br label %loop
|
|
101 loop:
|
|
102 %n.val = phi i32 [ %n, %entry ], [ %inc, %loop ]
|
|
103 %inc = add nsw i32 %n.val, 1
|
|
104 call void @print(i32 %n.val)
|
|
105 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
|
|
106 switch i8 %0, label %suspend [i8 0, label %loop
|
|
107 i8 1, label %cleanup]
|
|
108 cleanup:
|
|
109 %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
|
|
110 call void @free(i8* %mem)
|
|
111 br label %suspend
|
|
112 suspend:
|
121
|
113 %unused = call i1 @llvm.coro.end(i8* %hdl, i1 false)
|
120
|
114 ret i8* %hdl
|
|
115 }
|
|
116
|
|
117 The `entry` block establishes the coroutine frame. The `coro.size`_ intrinsic is
|
|
118 lowered to a constant representing the size required for the coroutine frame.
|
|
119 The `coro.begin`_ intrinsic initializes the coroutine frame and returns the
|
|
120 coroutine handle. The second parameter of `coro.begin` is given a block of memory
|
|
121 to be used if the coroutine frame needs to be allocated dynamically.
|
|
122 The `coro.id`_ intrinsic serves as coroutine identity useful in cases when the
|
|
123 `coro.begin`_ intrinsic get duplicated by optimization passes such as
|
|
124 jump-threading.
|
|
125
|
|
126 The `cleanup` block destroys the coroutine frame. The `coro.free`_ intrinsic,
|
|
127 given the coroutine handle, returns a pointer of the memory block to be freed or
|
|
128 `null` if the coroutine frame was not allocated dynamically. The `cleanup`
|
|
129 block is entered when coroutine runs to completion by itself or destroyed via
|
|
130 call to the `coro.destroy`_ intrinsic.
|
|
131
|
|
132 The `suspend` block contains code to be executed when coroutine runs to
|
|
133 completion or suspended. The `coro.end`_ intrinsic marks the point where
|
|
134 a coroutine needs to return control back to the caller if it is not an initial
|
|
135 invocation of the coroutine.
|
|
136
|
|
137 The `loop` blocks represents the body of the coroutine. The `coro.suspend`_
|
|
138 intrinsic in combination with the following switch indicates what happens to
|
|
139 control flow when a coroutine is suspended (default case), resumed (case 0) or
|
|
140 destroyed (case 1).
|
|
141
|
|
142 Coroutine Transformation
|
|
143 ------------------------
|
|
144
|
|
145 One of the steps of coroutine lowering is building the coroutine frame. The
|
|
146 def-use chains are analyzed to determine which objects need be kept alive across
|
|
147 suspend points. In the coroutine shown in the previous section, use of virtual register
|
|
148 `%n.val` is separated from the definition by a suspend point, therefore, it
|
|
149 cannot reside on the stack frame since the latter goes away once the coroutine
|
|
150 is suspended and control is returned back to the caller. An i32 slot is
|
|
151 allocated in the coroutine frame and `%n.val` is spilled and reloaded from that
|
|
152 slot as needed.
|
|
153
|
|
154 We also store addresses of the resume and destroy functions so that the
|
|
155 `coro.resume` and `coro.destroy` intrinsics can resume and destroy the coroutine
|
|
156 when its identity cannot be determined statically at compile time. For our
|
|
157 example, the coroutine frame will be:
|
|
158
|
121
|
159 .. code-block:: llvm
|
120
|
160
|
|
161 %f.frame = type { void (%f.frame*)*, void (%f.frame*)*, i32 }
|
|
162
|
|
163 After resume and destroy parts are outlined, function `f` will contain only the
|
|
164 code responsible for creation and initialization of the coroutine frame and
|
|
165 execution of the coroutine until a suspend point is reached:
|
|
166
|
121
|
167 .. code-block:: llvm
|
120
|
168
|
|
169 define i8* @f(i32 %n) {
|
|
170 entry:
|
|
171 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
|
|
172 %alloc = call noalias i8* @malloc(i32 24)
|
|
173 %0 = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc)
|
|
174 %frame = bitcast i8* %0 to %f.frame*
|
|
175 %1 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 0
|
|
176 store void (%f.frame*)* @f.resume, void (%f.frame*)** %1
|
|
177 %2 = getelementptr %f.frame, %f.frame* %frame, i32 0, i32 1
|
|
178 store void (%f.frame*)* @f.destroy, void (%f.frame*)** %2
|
|
179
|
|
180 %inc = add nsw i32 %n, 1
|
|
181 %inc.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
|
|
182 store i32 %inc, i32* %inc.spill.addr
|
|
183 call void @print(i32 %n)
|
|
184
|
|
185 ret i8* %frame
|
|
186 }
|
|
187
|
|
188 Outlined resume part of the coroutine will reside in function `f.resume`:
|
|
189
|
|
190 .. code-block:: llvm
|
|
191
|
|
192 define internal fastcc void @f.resume(%f.frame* %frame.ptr.resume) {
|
|
193 entry:
|
|
194 %inc.spill.addr = getelementptr %f.frame, %f.frame* %frame.ptr.resume, i64 0, i32 2
|
|
195 %inc.spill = load i32, i32* %inc.spill.addr, align 4
|
|
196 %inc = add i32 %n.val, 1
|
|
197 store i32 %inc, i32* %inc.spill.addr, align 4
|
|
198 tail call void @print(i32 %inc)
|
|
199 ret void
|
|
200 }
|
|
201
|
|
202 Whereas function `f.destroy` will contain the cleanup code for the coroutine:
|
|
203
|
|
204 .. code-block:: llvm
|
|
205
|
|
206 define internal fastcc void @f.destroy(%f.frame* %frame.ptr.destroy) {
|
|
207 entry:
|
|
208 %0 = bitcast %f.frame* %frame.ptr.destroy to i8*
|
|
209 tail call void @free(i8* %0)
|
|
210 ret void
|
|
211 }
|
|
212
|
|
213 Avoiding Heap Allocations
|
|
214 -------------------------
|
|
215
|
|
216 A particular coroutine usage pattern, which is illustrated by the `main`
|
|
217 function in the overview section, where a coroutine is created, manipulated and
|
|
218 destroyed by the same calling function, is common for coroutines implementing
|
|
219 RAII idiom and is suitable for allocation elision optimization which avoid
|
|
220 dynamic allocation by storing the coroutine frame as a static `alloca` in its
|
|
221 caller.
|
|
222
|
|
223 In the entry block, we will call `coro.alloc`_ intrinsic that will return `true`
|
|
224 when dynamic allocation is required, and `false` if dynamic allocation is
|
|
225 elided.
|
|
226
|
121
|
227 .. code-block:: llvm
|
120
|
228
|
|
229 entry:
|
|
230 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
|
|
231 %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id)
|
|
232 br i1 %need.dyn.alloc, label %dyn.alloc, label %coro.begin
|
|
233 dyn.alloc:
|
|
234 %size = call i32 @llvm.coro.size.i32()
|
|
235 %alloc = call i8* @CustomAlloc(i32 %size)
|
|
236 br label %coro.begin
|
|
237 coro.begin:
|
|
238 %phi = phi i8* [ null, %entry ], [ %alloc, %dyn.alloc ]
|
|
239 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %phi)
|
|
240
|
|
241 In the cleanup block, we will make freeing the coroutine frame conditional on
|
|
242 `coro.free`_ intrinsic. If allocation is elided, `coro.free`_ returns `null`
|
|
243 thus skipping the deallocation code:
|
|
244
|
121
|
245 .. code-block:: llvm
|
120
|
246
|
|
247 cleanup:
|
|
248 %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
|
|
249 %need.dyn.free = icmp ne i8* %mem, null
|
|
250 br i1 %need.dyn.free, label %dyn.free, label %if.end
|
|
251 dyn.free:
|
|
252 call void @CustomFree(i8* %mem)
|
|
253 br label %if.end
|
|
254 if.end:
|
|
255 ...
|
|
256
|
|
257 With allocations and deallocations represented as described as above, after
|
|
258 coroutine heap allocation elision optimization, the resulting main will be:
|
|
259
|
|
260 .. code-block:: llvm
|
|
261
|
|
262 define i32 @main() {
|
|
263 entry:
|
|
264 call void @print(i32 4)
|
|
265 call void @print(i32 5)
|
|
266 call void @print(i32 6)
|
|
267 ret i32 0
|
|
268 }
|
|
269
|
|
270 Multiple Suspend Points
|
|
271 -----------------------
|
|
272
|
|
273 Let's consider the coroutine that has more than one suspend point:
|
|
274
|
|
275 .. code-block:: c++
|
|
276
|
|
277 void *f(int n) {
|
|
278 for(;;) {
|
|
279 print(n++);
|
|
280 <suspend>
|
|
281 print(-n);
|
|
282 <suspend>
|
|
283 }
|
|
284 }
|
|
285
|
|
286 Matching LLVM code would look like (with the rest of the code remaining the same
|
|
287 as the code in the previous section):
|
|
288
|
121
|
289 .. code-block:: llvm
|
120
|
290
|
|
291 loop:
|
|
292 %n.addr = phi i32 [ %n, %entry ], [ %inc, %loop.resume ]
|
|
293 call void @print(i32 %n.addr) #4
|
|
294 %2 = call i8 @llvm.coro.suspend(token none, i1 false)
|
|
295 switch i8 %2, label %suspend [i8 0, label %loop.resume
|
|
296 i8 1, label %cleanup]
|
|
297 loop.resume:
|
|
298 %inc = add nsw i32 %n.addr, 1
|
|
299 %sub = xor i32 %n.addr, -1
|
|
300 call void @print(i32 %sub)
|
|
301 %3 = call i8 @llvm.coro.suspend(token none, i1 false)
|
|
302 switch i8 %3, label %suspend [i8 0, label %loop
|
|
303 i8 1, label %cleanup]
|
|
304
|
|
305 In this case, the coroutine frame would include a suspend index that will
|
|
306 indicate at which suspend point the coroutine needs to resume. The resume
|
|
307 function will use an index to jump to an appropriate basic block and will look
|
|
308 as follows:
|
|
309
|
|
310 .. code-block:: llvm
|
|
311
|
|
312 define internal fastcc void @f.Resume(%f.Frame* %FramePtr) {
|
|
313 entry.Resume:
|
|
314 %index.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 2
|
|
315 %index = load i8, i8* %index.addr, align 1
|
|
316 %switch = icmp eq i8 %index, 0
|
|
317 %n.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i64 0, i32 3
|
|
318 %n = load i32, i32* %n.addr, align 4
|
|
319 br i1 %switch, label %loop.resume, label %loop
|
|
320
|
|
321 loop.resume:
|
|
322 %sub = xor i32 %n, -1
|
|
323 call void @print(i32 %sub)
|
|
324 br label %suspend
|
|
325 loop:
|
|
326 %inc = add nsw i32 %n, 1
|
|
327 store i32 %inc, i32* %n.addr, align 4
|
|
328 tail call void @print(i32 %inc)
|
|
329 br label %suspend
|
|
330
|
|
331 suspend:
|
|
332 %storemerge = phi i8 [ 0, %loop ], [ 1, %loop.resume ]
|
|
333 store i8 %storemerge, i8* %index.addr, align 1
|
|
334 ret void
|
|
335 }
|
|
336
|
|
337 If different cleanup code needs to get executed for different suspend points,
|
|
338 a similar switch will be in the `f.destroy` function.
|
|
339
|
|
340 .. note ::
|
|
341
|
|
342 Using suspend index in a coroutine state and having a switch in `f.resume` and
|
|
343 `f.destroy` is one of the possible implementation strategies. We explored
|
|
344 another option where a distinct `f.resume1`, `f.resume2`, etc. are created for
|
|
345 every suspend point, and instead of storing an index, the resume and destroy
|
|
346 function pointers are updated at every suspend. Early testing showed that the
|
|
347 current approach is easier on the optimizer than the latter so it is a
|
|
348 lowering strategy implemented at the moment.
|
|
349
|
|
350 Distinct Save and Suspend
|
|
351 -------------------------
|
|
352
|
|
353 In the previous example, setting a resume index (or some other state change that
|
|
354 needs to happen to prepare a coroutine for resumption) happens at the same time as
|
|
355 a suspension of a coroutine. However, in certain cases, it is necessary to control
|
|
356 when coroutine is prepared for resumption and when it is suspended.
|
|
357
|
|
358 In the following example, a coroutine represents some activity that is driven
|
|
359 by completions of asynchronous operations `async_op1` and `async_op2` which get
|
|
360 a coroutine handle as a parameter and resume the coroutine once async
|
|
361 operation is finished.
|
|
362
|
|
363 .. code-block:: text
|
|
364
|
|
365 void g() {
|
|
366 for (;;)
|
|
367 if (cond()) {
|
|
368 async_op1(<coroutine-handle>); // will resume once async_op1 completes
|
|
369 <suspend>
|
|
370 do_one();
|
|
371 }
|
|
372 else {
|
|
373 async_op2(<coroutine-handle>); // will resume once async_op2 completes
|
|
374 <suspend>
|
|
375 do_two();
|
|
376 }
|
|
377 }
|
|
378 }
|
|
379
|
|
380 In this case, coroutine should be ready for resumption prior to a call to
|
|
381 `async_op1` and `async_op2`. The `coro.save`_ intrinsic is used to indicate a
|
|
382 point when coroutine should be ready for resumption (namely, when a resume index
|
|
383 should be stored in the coroutine frame, so that it can be resumed at the
|
|
384 correct resume point):
|
|
385
|
121
|
386 .. code-block:: llvm
|
120
|
387
|
|
388 if.true:
|
|
389 %save1 = call token @llvm.coro.save(i8* %hdl)
|
121
|
390 call void @async_op1(i8* %hdl)
|
120
|
391 %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
|
|
392 switch i8 %suspend1, label %suspend [i8 0, label %resume1
|
|
393 i8 1, label %cleanup]
|
|
394 if.false:
|
|
395 %save2 = call token @llvm.coro.save(i8* %hdl)
|
121
|
396 call void @async_op2(i8* %hdl)
|
120
|
397 %suspend2 = call i1 @llvm.coro.suspend(token %save2, i1 false)
|
|
398 switch i8 %suspend1, label %suspend [i8 0, label %resume2
|
|
399 i8 1, label %cleanup]
|
|
400
|
|
401 .. _coroutine promise:
|
|
402
|
|
403 Coroutine Promise
|
|
404 -----------------
|
|
405
|
|
406 A coroutine author or a frontend may designate a distinguished `alloca` that can
|
|
407 be used to communicate with the coroutine. This distinguished alloca is called
|
|
408 **coroutine promise** and is provided as the second parameter to the
|
|
409 `coro.id`_ intrinsic.
|
|
410
|
|
411 The following coroutine designates a 32 bit integer `promise` and uses it to
|
|
412 store the current value produced by a coroutine.
|
|
413
|
121
|
414 .. code-block:: llvm
|
120
|
415
|
|
416 define i8* @f(i32 %n) {
|
|
417 entry:
|
|
418 %promise = alloca i32
|
|
419 %pv = bitcast i32* %promise to i8*
|
|
420 %id = call token @llvm.coro.id(i32 0, i8* %pv, i8* null, i8* null)
|
|
421 %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id)
|
|
422 br i1 %need.dyn.alloc, label %dyn.alloc, label %coro.begin
|
|
423 dyn.alloc:
|
|
424 %size = call i32 @llvm.coro.size.i32()
|
|
425 %alloc = call i8* @malloc(i32 %size)
|
|
426 br label %coro.begin
|
|
427 coro.begin:
|
|
428 %phi = phi i8* [ null, %entry ], [ %alloc, %dyn.alloc ]
|
|
429 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %phi)
|
|
430 br label %loop
|
|
431 loop:
|
|
432 %n.val = phi i32 [ %n, %coro.begin ], [ %inc, %loop ]
|
|
433 %inc = add nsw i32 %n.val, 1
|
|
434 store i32 %n.val, i32* %promise
|
|
435 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
|
|
436 switch i8 %0, label %suspend [i8 0, label %loop
|
|
437 i8 1, label %cleanup]
|
|
438 cleanup:
|
|
439 %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
|
|
440 call void @free(i8* %mem)
|
|
441 br label %suspend
|
|
442 suspend:
|
121
|
443 %unused = call i1 @llvm.coro.end(i8* %hdl, i1 false)
|
120
|
444 ret i8* %hdl
|
|
445 }
|
|
446
|
|
447 A coroutine consumer can rely on the `coro.promise`_ intrinsic to access the
|
|
448 coroutine promise.
|
|
449
|
|
450 .. code-block:: llvm
|
|
451
|
|
452 define i32 @main() {
|
|
453 entry:
|
|
454 %hdl = call i8* @f(i32 4)
|
|
455 %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
|
|
456 %promise.addr = bitcast i8* %promise.addr.raw to i32*
|
|
457 %val0 = load i32, i32* %promise.addr
|
|
458 call void @print(i32 %val0)
|
|
459 call void @llvm.coro.resume(i8* %hdl)
|
|
460 %val1 = load i32, i32* %promise.addr
|
|
461 call void @print(i32 %val1)
|
|
462 call void @llvm.coro.resume(i8* %hdl)
|
|
463 %val2 = load i32, i32* %promise.addr
|
|
464 call void @print(i32 %val2)
|
|
465 call void @llvm.coro.destroy(i8* %hdl)
|
|
466 ret i32 0
|
|
467 }
|
|
468
|
|
469 After example in this section is compiled, result of the compilation will be:
|
|
470
|
|
471 .. code-block:: llvm
|
|
472
|
|
473 define i32 @main() {
|
|
474 entry:
|
|
475 tail call void @print(i32 4)
|
|
476 tail call void @print(i32 5)
|
|
477 tail call void @print(i32 6)
|
|
478 ret i32 0
|
|
479 }
|
|
480
|
|
481 .. _final:
|
|
482 .. _final suspend:
|
|
483
|
|
484 Final Suspend
|
|
485 -------------
|
|
486
|
|
487 A coroutine author or a frontend may designate a particular suspend to be final,
|
|
488 by setting the second argument of the `coro.suspend`_ intrinsic to `true`.
|
|
489 Such a suspend point has two properties:
|
|
490
|
|
491 * it is possible to check whether a suspended coroutine is at the final suspend
|
|
492 point via `coro.done`_ intrinsic;
|
|
493
|
|
494 * a resumption of a coroutine stopped at the final suspend point leads to
|
|
495 undefined behavior. The only possible action for a coroutine at a final
|
|
496 suspend point is destroying it via `coro.destroy`_ intrinsic.
|
|
497
|
|
498 From the user perspective, the final suspend point represents an idea of a
|
|
499 coroutine reaching the end. From the compiler perspective, it is an optimization
|
|
500 opportunity for reducing number of resume points (and therefore switch cases) in
|
|
501 the resume function.
|
|
502
|
|
503 The following is an example of a function that keeps resuming the coroutine
|
|
504 until the final suspend point is reached after which point the coroutine is
|
|
505 destroyed:
|
|
506
|
|
507 .. code-block:: llvm
|
|
508
|
|
509 define i32 @main() {
|
|
510 entry:
|
|
511 %hdl = call i8* @f(i32 4)
|
|
512 br label %while
|
|
513 while:
|
|
514 call void @llvm.coro.resume(i8* %hdl)
|
|
515 %done = call i1 @llvm.coro.done(i8* %hdl)
|
|
516 br i1 %done, label %end, label %while
|
|
517 end:
|
|
518 call void @llvm.coro.destroy(i8* %hdl)
|
|
519 ret i32 0
|
|
520 }
|
|
521
|
|
522 Usually, final suspend point is a frontend injected suspend point that does not
|
|
523 correspond to any explicitly authored suspend point of the high level language.
|
|
524 For example, for a Python generator that has only one suspend point:
|
|
525
|
|
526 .. code-block:: python
|
|
527
|
|
528 def coroutine(n):
|
|
529 for i in range(n):
|
|
530 yield i
|
|
531
|
|
532 Python frontend would inject two more suspend points, so that the actual code
|
|
533 looks like this:
|
|
534
|
|
535 .. code-block:: c
|
|
536
|
|
537 void* coroutine(int n) {
|
|
538 int current_value;
|
|
539 <designate current_value to be coroutine promise>
|
|
540 <SUSPEND> // injected suspend point, so that the coroutine starts suspended
|
|
541 for (int i = 0; i < n; ++i) {
|
|
542 current_value = i; <SUSPEND>; // corresponds to "yield i"
|
|
543 }
|
|
544 <SUSPEND final=true> // injected final suspend point
|
|
545 }
|
|
546
|
|
547 and python iterator `__next__` would look like:
|
|
548
|
|
549 .. code-block:: c++
|
|
550
|
|
551 int __next__(void* hdl) {
|
|
552 coro.resume(hdl);
|
|
553 if (coro.done(hdl)) throw StopIteration();
|
|
554 return *(int*)coro.promise(hdl, 4, false);
|
|
555 }
|
|
556
|
|
557 Intrinsics
|
|
558 ==========
|
|
559
|
|
560 Coroutine Manipulation Intrinsics
|
|
561 ---------------------------------
|
|
562
|
|
563 Intrinsics described in this section are used to manipulate an existing
|
|
564 coroutine. They can be used in any function which happen to have a pointer
|
|
565 to a `coroutine frame`_ or a pointer to a `coroutine promise`_.
|
|
566
|
|
567 .. _coro.destroy:
|
|
568
|
|
569 'llvm.coro.destroy' Intrinsic
|
|
570 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
571
|
|
572 Syntax:
|
|
573 """""""
|
|
574
|
|
575 ::
|
|
576
|
|
577 declare void @llvm.coro.destroy(i8* <handle>)
|
|
578
|
|
579 Overview:
|
|
580 """""""""
|
|
581
|
|
582 The '``llvm.coro.destroy``' intrinsic destroys a suspended
|
|
583 coroutine.
|
|
584
|
|
585 Arguments:
|
|
586 """"""""""
|
|
587
|
|
588 The argument is a coroutine handle to a suspended coroutine.
|
|
589
|
|
590 Semantics:
|
|
591 """"""""""
|
|
592
|
|
593 When possible, the `coro.destroy` intrinsic is replaced with a direct call to
|
|
594 the coroutine destroy function. Otherwise it is replaced with an indirect call
|
|
595 based on the function pointer for the destroy function stored in the coroutine
|
|
596 frame. Destroying a coroutine that is not suspended leads to undefined behavior.
|
|
597
|
|
598 .. _coro.resume:
|
|
599
|
|
600 'llvm.coro.resume' Intrinsic
|
|
601 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
602
|
|
603 ::
|
|
604
|
|
605 declare void @llvm.coro.resume(i8* <handle>)
|
|
606
|
|
607 Overview:
|
|
608 """""""""
|
|
609
|
|
610 The '``llvm.coro.resume``' intrinsic resumes a suspended coroutine.
|
|
611
|
|
612 Arguments:
|
|
613 """"""""""
|
|
614
|
|
615 The argument is a handle to a suspended coroutine.
|
|
616
|
|
617 Semantics:
|
|
618 """"""""""
|
|
619
|
|
620 When possible, the `coro.resume` intrinsic is replaced with a direct call to the
|
|
621 coroutine resume function. Otherwise it is replaced with an indirect call based
|
|
622 on the function pointer for the resume function stored in the coroutine frame.
|
|
623 Resuming a coroutine that is not suspended leads to undefined behavior.
|
|
624
|
|
625 .. _coro.done:
|
|
626
|
|
627 'llvm.coro.done' Intrinsic
|
|
628 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
629
|
|
630 ::
|
|
631
|
|
632 declare i1 @llvm.coro.done(i8* <handle>)
|
|
633
|
|
634 Overview:
|
|
635 """""""""
|
|
636
|
|
637 The '``llvm.coro.done``' intrinsic checks whether a suspended coroutine is at
|
|
638 the final suspend point or not.
|
|
639
|
|
640 Arguments:
|
|
641 """"""""""
|
|
642
|
|
643 The argument is a handle to a suspended coroutine.
|
|
644
|
|
645 Semantics:
|
|
646 """"""""""
|
|
647
|
|
648 Using this intrinsic on a coroutine that does not have a `final suspend`_ point
|
|
649 or on a coroutine that is not suspended leads to undefined behavior.
|
|
650
|
|
651 .. _coro.promise:
|
|
652
|
|
653 'llvm.coro.promise' Intrinsic
|
|
654 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
655
|
|
656 ::
|
|
657
|
|
658 declare i8* @llvm.coro.promise(i8* <ptr>, i32 <alignment>, i1 <from>)
|
|
659
|
|
660 Overview:
|
|
661 """""""""
|
|
662
|
|
663 The '``llvm.coro.promise``' intrinsic obtains a pointer to a
|
|
664 `coroutine promise`_ given a coroutine handle and vice versa.
|
|
665
|
|
666 Arguments:
|
|
667 """"""""""
|
|
668
|
|
669 The first argument is a handle to a coroutine if `from` is false. Otherwise,
|
|
670 it is a pointer to a coroutine promise.
|
|
671
|
|
672 The second argument is an alignment requirements of the promise.
|
|
673 If a frontend designated `%promise = alloca i32` as a promise, the alignment
|
|
674 argument to `coro.promise` should be the alignment of `i32` on the target
|
|
675 platform. If a frontend designated `%promise = alloca i32, align 16` as a
|
|
676 promise, the alignment argument should be 16.
|
|
677 This argument only accepts constants.
|
|
678
|
|
679 The third argument is a boolean indicating a direction of the transformation.
|
|
680 If `from` is true, the intrinsic returns a coroutine handle given a pointer
|
|
681 to a promise. If `from` is false, the intrinsics return a pointer to a promise
|
|
682 from a coroutine handle. This argument only accepts constants.
|
|
683
|
|
684 Semantics:
|
|
685 """"""""""
|
|
686
|
|
687 Using this intrinsic on a coroutine that does not have a coroutine promise
|
|
688 leads to undefined behavior. It is possible to read and modify coroutine
|
|
689 promise of the coroutine which is currently executing. The coroutine author and
|
|
690 a coroutine user are responsible to makes sure there is no data races.
|
|
691
|
|
692 Example:
|
|
693 """"""""
|
|
694
|
121
|
695 .. code-block:: llvm
|
120
|
696
|
|
697 define i8* @f(i32 %n) {
|
|
698 entry:
|
|
699 %promise = alloca i32
|
|
700 %pv = bitcast i32* %promise to i8*
|
|
701 ; the second argument to coro.id points to the coroutine promise.
|
|
702 %id = call token @llvm.coro.id(i32 0, i8* %pv, i8* null, i8* null)
|
|
703 ...
|
|
704 %hdl = call noalias i8* @llvm.coro.begin(token %id, i8* %alloc)
|
|
705 ...
|
|
706 store i32 42, i32* %promise ; store something into the promise
|
|
707 ...
|
|
708 ret i8* %hdl
|
|
709 }
|
|
710
|
|
711 define i32 @main() {
|
|
712 entry:
|
|
713 %hdl = call i8* @f(i32 4) ; starts the coroutine and returns its handle
|
|
714 %promise.addr.raw = call i8* @llvm.coro.promise(i8* %hdl, i32 4, i1 false)
|
|
715 %promise.addr = bitcast i8* %promise.addr.raw to i32*
|
|
716 %val = load i32, i32* %promise.addr ; load a value from the promise
|
|
717 call void @print(i32 %val)
|
|
718 call void @llvm.coro.destroy(i8* %hdl)
|
|
719 ret i32 0
|
|
720 }
|
|
721
|
|
722 .. _coroutine intrinsics:
|
|
723
|
|
724 Coroutine Structure Intrinsics
|
|
725 ------------------------------
|
|
726 Intrinsics described in this section are used within a coroutine to describe
|
|
727 the coroutine structure. They should not be used outside of a coroutine.
|
|
728
|
|
729 .. _coro.size:
|
|
730
|
|
731 'llvm.coro.size' Intrinsic
|
|
732 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
733 ::
|
|
734
|
|
735 declare i32 @llvm.coro.size.i32()
|
|
736 declare i64 @llvm.coro.size.i64()
|
|
737
|
|
738 Overview:
|
|
739 """""""""
|
|
740
|
|
741 The '``llvm.coro.size``' intrinsic returns the number of bytes
|
|
742 required to store a `coroutine frame`_.
|
|
743
|
|
744 Arguments:
|
|
745 """"""""""
|
|
746
|
|
747 None
|
|
748
|
|
749 Semantics:
|
|
750 """"""""""
|
|
751
|
|
752 The `coro.size` intrinsic is lowered to a constant representing the size of
|
|
753 the coroutine frame.
|
|
754
|
|
755 .. _coro.begin:
|
|
756
|
|
757 'llvm.coro.begin' Intrinsic
|
|
758 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
759 ::
|
|
760
|
|
761 declare i8* @llvm.coro.begin(token <id>, i8* <mem>)
|
|
762
|
|
763 Overview:
|
|
764 """""""""
|
|
765
|
|
766 The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame.
|
|
767
|
|
768 Arguments:
|
|
769 """"""""""
|
|
770
|
|
771 The first argument is a token returned by a call to '``llvm.coro.id``'
|
|
772 identifying the coroutine.
|
|
773
|
|
774 The second argument is a pointer to a block of memory where coroutine frame
|
|
775 will be stored if it is allocated dynamically.
|
|
776
|
|
777 Semantics:
|
|
778 """"""""""
|
|
779
|
|
780 Depending on the alignment requirements of the objects in the coroutine frame
|
|
781 and/or on the codegen compactness reasons the pointer returned from `coro.begin`
|
|
782 may be at offset to the `%mem` argument. (This could be beneficial if
|
|
783 instructions that express relative access to data can be more compactly encoded
|
|
784 with small positive and negative offsets).
|
|
785
|
|
786 A frontend should emit exactly one `coro.begin` intrinsic per coroutine.
|
|
787
|
|
788 .. _coro.free:
|
|
789
|
|
790 'llvm.coro.free' Intrinsic
|
|
791 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
792 ::
|
|
793
|
|
794 declare i8* @llvm.coro.free(token %id, i8* <frame>)
|
|
795
|
|
796 Overview:
|
|
797 """""""""
|
|
798
|
|
799 The '``llvm.coro.free``' intrinsic returns a pointer to a block of memory where
|
|
800 coroutine frame is stored or `null` if this instance of a coroutine did not use
|
|
801 dynamically allocated memory for its coroutine frame.
|
|
802
|
|
803 Arguments:
|
|
804 """"""""""
|
|
805
|
|
806 The first argument is a token returned by a call to '``llvm.coro.id``'
|
|
807 identifying the coroutine.
|
|
808
|
|
809 The second argument is a pointer to the coroutine frame. This should be the same
|
|
810 pointer that was returned by prior `coro.begin` call.
|
|
811
|
|
812 Example (custom deallocation function):
|
|
813 """""""""""""""""""""""""""""""""""""""
|
|
814
|
121
|
815 .. code-block:: llvm
|
120
|
816
|
|
817 cleanup:
|
|
818 %mem = call i8* @llvm.coro.free(token %id, i8* %frame)
|
|
819 %mem_not_null = icmp ne i8* %mem, null
|
|
820 br i1 %mem_not_null, label %if.then, label %if.end
|
|
821 if.then:
|
|
822 call void @CustomFree(i8* %mem)
|
|
823 br label %if.end
|
|
824 if.end:
|
|
825 ret void
|
|
826
|
|
827 Example (standard deallocation functions):
|
|
828 """"""""""""""""""""""""""""""""""""""""""
|
|
829
|
121
|
830 .. code-block:: llvm
|
120
|
831
|
|
832 cleanup:
|
|
833 %mem = call i8* @llvm.coro.free(token %id, i8* %frame)
|
|
834 call void @free(i8* %mem)
|
|
835 ret void
|
|
836
|
|
837 .. _coro.alloc:
|
|
838
|
|
839 'llvm.coro.alloc' Intrinsic
|
|
840 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
841 ::
|
|
842
|
|
843 declare i1 @llvm.coro.alloc(token <id>)
|
|
844
|
|
845 Overview:
|
|
846 """""""""
|
|
847
|
|
848 The '``llvm.coro.alloc``' intrinsic returns `true` if dynamic allocation is
|
121
|
849 required to obtain a memory for the coroutine frame and `false` otherwise.
|
120
|
850
|
|
851 Arguments:
|
|
852 """"""""""
|
|
853
|
|
854 The first argument is a token returned by a call to '``llvm.coro.id``'
|
|
855 identifying the coroutine.
|
|
856
|
|
857 Semantics:
|
|
858 """"""""""
|
|
859
|
|
860 A frontend should emit at most one `coro.alloc` intrinsic per coroutine.
|
|
861 The intrinsic is used to suppress dynamic allocation of the coroutine frame
|
|
862 when possible.
|
|
863
|
|
864 Example:
|
|
865 """"""""
|
|
866
|
121
|
867 .. code-block:: llvm
|
120
|
868
|
|
869 entry:
|
|
870 %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
|
|
871 %dyn.alloc.required = call i1 @llvm.coro.alloc(token %id)
|
|
872 br i1 %dyn.alloc.required, label %coro.alloc, label %coro.begin
|
|
873
|
|
874 coro.alloc:
|
|
875 %frame.size = call i32 @llvm.coro.size()
|
|
876 %alloc = call i8* @MyAlloc(i32 %frame.size)
|
|
877 br label %coro.begin
|
|
878
|
|
879 coro.begin:
|
|
880 %phi = phi i8* [ null, %entry ], [ %alloc, %coro.alloc ]
|
|
881 %frame = call i8* @llvm.coro.begin(token %id, i8* %phi)
|
|
882
|
147
|
883 .. _coro.noop:
|
|
884
|
|
885 'llvm.coro.noop' Intrinsic
|
|
886 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
887 ::
|
|
888
|
|
889 declare i8* @llvm.coro.noop()
|
|
890
|
|
891 Overview:
|
|
892 """""""""
|
|
893
|
|
894 The '``llvm.coro.noop``' intrinsic returns an address of the coroutine frame of
|
|
895 a coroutine that does nothing when resumed or destroyed.
|
|
896
|
|
897 Arguments:
|
|
898 """"""""""
|
|
899
|
|
900 None
|
|
901
|
|
902 Semantics:
|
|
903 """"""""""
|
|
904
|
|
905 This intrinsic is lowered to refer to a private constant coroutine frame. The
|
|
906 resume and destroy handlers for this frame are empty functions that do nothing.
|
|
907 Note that in different translation units llvm.coro.noop may return different pointers.
|
|
908
|
120
|
909 .. _coro.frame:
|
|
910
|
|
911 'llvm.coro.frame' Intrinsic
|
|
912 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
913 ::
|
|
914
|
|
915 declare i8* @llvm.coro.frame()
|
|
916
|
|
917 Overview:
|
|
918 """""""""
|
|
919
|
|
920 The '``llvm.coro.frame``' intrinsic returns an address of the coroutine frame of
|
|
921 the enclosing coroutine.
|
|
922
|
|
923 Arguments:
|
|
924 """"""""""
|
|
925
|
|
926 None
|
|
927
|
|
928 Semantics:
|
|
929 """"""""""
|
|
930
|
|
931 This intrinsic is lowered to refer to the `coro.begin`_ instruction. This is
|
|
932 a frontend convenience intrinsic that makes it easier to refer to the
|
|
933 coroutine frame.
|
|
934
|
|
935 .. _coro.id:
|
|
936
|
|
937 'llvm.coro.id' Intrinsic
|
|
938 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
939 ::
|
|
940
|
|
941 declare token @llvm.coro.id(i32 <align>, i8* <promise>, i8* <coroaddr>,
|
|
942 i8* <fnaddrs>)
|
|
943
|
|
944 Overview:
|
|
945 """""""""
|
|
946
|
|
947 The '``llvm.coro.id``' intrinsic returns a token identifying a coroutine.
|
|
948
|
|
949 Arguments:
|
|
950 """"""""""
|
|
951
|
|
952 The first argument provides information on the alignment of the memory returned
|
|
953 by the allocation function and given to `coro.begin` by the first argument. If
|
|
954 this argument is 0, the memory is assumed to be aligned to 2 * sizeof(i8*).
|
|
955 This argument only accepts constants.
|
|
956
|
|
957 The second argument, if not `null`, designates a particular alloca instruction
|
|
958 to be a `coroutine promise`_.
|
|
959
|
|
960 The third argument is `null` coming out of the frontend. The CoroEarly pass sets
|
|
961 this argument to point to the function this coro.id belongs to.
|
|
962
|
|
963 The fourth argument is `null` before coroutine is split, and later is replaced
|
|
964 to point to a private global constant array containing function pointers to
|
|
965 outlined resume and destroy parts of the coroutine.
|
|
966
|
|
967
|
|
968 Semantics:
|
|
969 """"""""""
|
|
970
|
|
971 The purpose of this intrinsic is to tie together `coro.id`, `coro.alloc` and
|
|
972 `coro.begin` belonging to the same coroutine to prevent optimization passes from
|
|
973 duplicating any of these instructions unless entire body of the coroutine is
|
|
974 duplicated.
|
|
975
|
|
976 A frontend should emit exactly one `coro.id` intrinsic per coroutine.
|
|
977
|
|
978 .. _coro.end:
|
|
979
|
|
980 'llvm.coro.end' Intrinsic
|
|
981 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
982 ::
|
|
983
|
121
|
984 declare i1 @llvm.coro.end(i8* <handle>, i1 <unwind>)
|
120
|
985
|
|
986 Overview:
|
|
987 """""""""
|
|
988
|
|
989 The '``llvm.coro.end``' marks the point where execution of the resume part of
|
121
|
990 the coroutine should end and control should return to the caller.
|
120
|
991
|
|
992
|
|
993 Arguments:
|
|
994 """"""""""
|
|
995
|
121
|
996 The first argument should refer to the coroutine handle of the enclosing
|
|
997 coroutine. A frontend is allowed to supply null as the first parameter, in this
|
|
998 case `coro-early` pass will replace the null with an appropriate coroutine
|
|
999 handle value.
|
120
|
1000
|
|
1001 The second argument should be `true` if this coro.end is in the block that is
|
121
|
1002 part of the unwind sequence leaving the coroutine body due to an exception and
|
|
1003 `false` otherwise.
|
120
|
1004
|
|
1005 Semantics:
|
|
1006 """"""""""
|
121
|
1007 The purpose of this intrinsic is to allow frontends to mark the cleanup and
|
|
1008 other code that is only relevant during the initial invocation of the coroutine
|
|
1009 and should not be present in resume and destroy parts.
|
120
|
1010
|
121
|
1011 This intrinsic is lowered when a coroutine is split into
|
|
1012 the start, resume and destroy parts. In the start part, it is a no-op,
|
|
1013 in resume and destroy parts, it is replaced with `ret void` instruction and
|
120
|
1014 the rest of the block containing `coro.end` instruction is discarded.
|
121
|
1015 In landing pads it is replaced with an appropriate instruction to unwind to
|
|
1016 caller. The handling of coro.end differs depending on whether the target is
|
|
1017 using landingpad or WinEH exception model.
|
120
|
1018
|
121
|
1019 For landingpad based exception model, it is expected that frontend uses the
|
|
1020 `coro.end`_ intrinsic as follows:
|
|
1021
|
|
1022 .. code-block:: llvm
|
|
1023
|
|
1024 ehcleanup:
|
|
1025 %InResumePart = call i1 @llvm.coro.end(i8* null, i1 true)
|
|
1026 br i1 %InResumePart, label %eh.resume, label %cleanup.cont
|
|
1027
|
|
1028 cleanup.cont:
|
|
1029 ; rest of the cleanup
|
|
1030
|
|
1031 eh.resume:
|
|
1032 %exn = load i8*, i8** %exn.slot, align 8
|
|
1033 %sel = load i32, i32* %ehselector.slot, align 4
|
|
1034 %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0
|
|
1035 %lpad.val29 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
|
|
1036 resume { i8*, i32 } %lpad.val29
|
120
|
1037
|
121
|
1038 The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions,
|
|
1039 thus leading to immediate unwind to the caller, whereas in start function it
|
|
1040 is replaced with ``False``, thus allowing to proceed to the rest of the cleanup
|
|
1041 code that is only needed during initial invocation of the coroutine.
|
|
1042
|
|
1043 For Windows Exception handling model, a frontend should attach a funclet bundle
|
|
1044 referring to an enclosing cleanuppad as follows:
|
|
1045
|
|
1046 .. code-block:: llvm
|
|
1047
|
|
1048 ehcleanup:
|
|
1049 %tok = cleanuppad within none []
|
|
1050 %unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ]
|
|
1051 cleanupret from %tok unwind label %RestOfTheCleanup
|
|
1052
|
|
1053 The `CoroSplit` pass, if the funclet bundle is present, will insert
|
|
1054 ``cleanupret from %tok unwind to caller`` before
|
|
1055 the `coro.end`_ intrinsic and will remove the rest of the block.
|
|
1056
|
|
1057 The following table summarizes the handling of `coro.end`_ intrinsic.
|
|
1058
|
|
1059 +--------------------------+-------------------+-------------------------------+
|
|
1060 | | In Start Function | In Resume/Destroy Functions |
|
|
1061 +--------------------------+-------------------+-------------------------------+
|
|
1062 |unwind=false | nothing |``ret void`` |
|
|
1063 +------------+-------------+-------------------+-------------------------------+
|
|
1064 | | WinEH | nothing |``cleanupret unwind to caller``|
|
|
1065 |unwind=true +-------------+-------------------+-------------------------------+
|
|
1066 | | Landingpad | nothing | nothing |
|
|
1067 +------------+-------------+-------------------+-------------------------------+
|
120
|
1068
|
|
1069 .. _coro.suspend:
|
|
1070 .. _suspend points:
|
|
1071
|
|
1072 'llvm.coro.suspend' Intrinsic
|
|
1073 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
1074 ::
|
|
1075
|
|
1076 declare i8 @llvm.coro.suspend(token <save>, i1 <final>)
|
|
1077
|
|
1078 Overview:
|
|
1079 """""""""
|
|
1080
|
|
1081 The '``llvm.coro.suspend``' marks the point where execution of the coroutine
|
|
1082 need to get suspended and control returned back to the caller.
|
|
1083 Conditional branches consuming the result of this intrinsic lead to basic blocks
|
|
1084 where coroutine should proceed when suspended (-1), resumed (0) or destroyed
|
|
1085 (1).
|
|
1086
|
|
1087 Arguments:
|
|
1088 """"""""""
|
|
1089
|
|
1090 The first argument refers to a token of `coro.save` intrinsic that marks the
|
|
1091 point when coroutine state is prepared for suspension. If `none` token is passed,
|
|
1092 the intrinsic behaves as if there were a `coro.save` immediately preceding
|
|
1093 the `coro.suspend` intrinsic.
|
|
1094
|
|
1095 The second argument indicates whether this suspension point is `final`_.
|
|
1096 The second argument only accepts constants. If more than one suspend point is
|
|
1097 designated as final, the resume and destroy branches should lead to the same
|
|
1098 basic blocks.
|
|
1099
|
|
1100 Example (normal suspend point):
|
|
1101 """""""""""""""""""""""""""""""
|
|
1102
|
121
|
1103 .. code-block:: llvm
|
120
|
1104
|
|
1105 %0 = call i8 @llvm.coro.suspend(token none, i1 false)
|
|
1106 switch i8 %0, label %suspend [i8 0, label %resume
|
|
1107 i8 1, label %cleanup]
|
|
1108
|
|
1109 Example (final suspend point):
|
|
1110 """"""""""""""""""""""""""""""
|
|
1111
|
121
|
1112 .. code-block:: llvm
|
120
|
1113
|
|
1114 while.end:
|
|
1115 %s.final = call i8 @llvm.coro.suspend(token none, i1 true)
|
|
1116 switch i8 %s.final, label %suspend [i8 0, label %trap
|
|
1117 i8 1, label %cleanup]
|
|
1118 trap:
|
|
1119 call void @llvm.trap()
|
|
1120 unreachable
|
|
1121
|
|
1122 Semantics:
|
|
1123 """"""""""
|
|
1124
|
|
1125 If a coroutine that was suspended at the suspend point marked by this intrinsic
|
|
1126 is resumed via `coro.resume`_ the control will transfer to the basic block
|
|
1127 of the 0-case. If it is resumed via `coro.destroy`_, it will proceed to the
|
|
1128 basic block indicated by the 1-case. To suspend, coroutine proceed to the
|
|
1129 default label.
|
|
1130
|
|
1131 If suspend intrinsic is marked as final, it can consider the `true` branch
|
|
1132 unreachable and can perform optimizations that can take advantage of that fact.
|
|
1133
|
|
1134 .. _coro.save:
|
|
1135
|
|
1136 'llvm.coro.save' Intrinsic
|
|
1137 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
1138 ::
|
|
1139
|
|
1140 declare token @llvm.coro.save(i8* <handle>)
|
|
1141
|
|
1142 Overview:
|
|
1143 """""""""
|
|
1144
|
|
1145 The '``llvm.coro.save``' marks the point where a coroutine need to update its
|
|
1146 state to prepare for resumption to be considered suspended (and thus eligible
|
|
1147 for resumption).
|
|
1148
|
|
1149 Arguments:
|
|
1150 """"""""""
|
|
1151
|
|
1152 The first argument points to a coroutine handle of the enclosing coroutine.
|
|
1153
|
|
1154 Semantics:
|
|
1155 """"""""""
|
|
1156
|
|
1157 Whatever coroutine state changes are required to enable resumption of
|
|
1158 the coroutine from the corresponding suspend point should be done at the point
|
|
1159 of `coro.save` intrinsic.
|
|
1160
|
|
1161 Example:
|
|
1162 """"""""
|
|
1163
|
|
1164 Separate save and suspend points are necessary when a coroutine is used to
|
|
1165 represent an asynchronous control flow driven by callbacks representing
|
|
1166 completions of asynchronous operations.
|
|
1167
|
|
1168 In such a case, a coroutine should be ready for resumption prior to a call to
|
|
1169 `async_op` function that may trigger resumption of a coroutine from the same or
|
|
1170 a different thread possibly prior to `async_op` call returning control back
|
|
1171 to the coroutine:
|
|
1172
|
121
|
1173 .. code-block:: llvm
|
120
|
1174
|
|
1175 %save1 = call token @llvm.coro.save(i8* %hdl)
|
121
|
1176 call void @async_op1(i8* %hdl)
|
120
|
1177 %suspend1 = call i1 @llvm.coro.suspend(token %save1, i1 false)
|
|
1178 switch i8 %suspend1, label %suspend [i8 0, label %resume1
|
|
1179 i8 1, label %cleanup]
|
|
1180
|
|
1181 .. _coro.param:
|
|
1182
|
|
1183 'llvm.coro.param' Intrinsic
|
|
1184 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
1185 ::
|
|
1186
|
|
1187 declare i1 @llvm.coro.param(i8* <original>, i8* <copy>)
|
|
1188
|
|
1189 Overview:
|
|
1190 """""""""
|
|
1191
|
|
1192 The '``llvm.coro.param``' is used by a frontend to mark up the code used to
|
|
1193 construct and destruct copies of the parameters. If the optimizer discovers that
|
|
1194 a particular parameter copy is not used after any suspends, it can remove the
|
|
1195 construction and destruction of the copy by replacing corresponding coro.param
|
|
1196 with `i1 false` and replacing any use of the `copy` with the `original`.
|
|
1197
|
|
1198 Arguments:
|
|
1199 """"""""""
|
|
1200
|
|
1201 The first argument points to an `alloca` storing the value of a parameter to a
|
|
1202 coroutine.
|
|
1203
|
|
1204 The second argument points to an `alloca` storing the value of the copy of that
|
|
1205 parameter.
|
|
1206
|
|
1207 Semantics:
|
|
1208 """"""""""
|
|
1209
|
|
1210 The optimizer is free to always replace this intrinsic with `i1 true`.
|
|
1211
|
|
1212 The optimizer is also allowed to replace it with `i1 false` provided that the
|
|
1213 parameter copy is only used prior to control flow reaching any of the suspend
|
|
1214 points. The code that would be DCE'd if the `coro.param` is replaced with
|
|
1215 `i1 false` is not considered to be a use of the parameter copy.
|
|
1216
|
|
1217 The frontend can emit this intrinsic if its language rules allow for this
|
|
1218 optimization.
|
|
1219
|
|
1220 Example:
|
|
1221 """"""""
|
|
1222 Consider the following example. A coroutine takes two parameters `a` and `b`
|
|
1223 that has a destructor and a move constructor.
|
|
1224
|
|
1225 .. code-block:: c++
|
|
1226
|
|
1227 struct A { ~A(); A(A&&); bool foo(); void bar(); };
|
|
1228
|
|
1229 task<int> f(A a, A b) {
|
|
1230 if (a.foo())
|
|
1231 return 42;
|
|
1232
|
|
1233 a.bar();
|
|
1234 co_await read_async(); // introduces suspend point
|
|
1235 b.bar();
|
|
1236 }
|
|
1237
|
|
1238 Note that, uses of `b` is used after a suspend point and thus must be copied
|
|
1239 into a coroutine frame, whereas `a` does not have to, since it never used
|
|
1240 after suspend.
|
|
1241
|
|
1242 A frontend can create parameter copies for `a` and `b` as follows:
|
|
1243
|
|
1244 .. code-block:: text
|
|
1245
|
|
1246 task<int> f(A a', A b') {
|
|
1247 a = alloca A;
|
|
1248 b = alloca A;
|
|
1249 // move parameters to its copies
|
|
1250 if (coro.param(a', a)) A::A(a, A&& a');
|
|
1251 if (coro.param(b', b)) A::A(b, A&& b');
|
|
1252 ...
|
|
1253 // destroy parameters copies
|
|
1254 if (coro.param(a', a)) A::~A(a);
|
|
1255 if (coro.param(b', b)) A::~A(b);
|
|
1256 }
|
|
1257
|
|
1258 The optimizer can replace coro.param(a',a) with `i1 false` and replace all uses
|
|
1259 of `a` with `a'`, since it is not used after suspend.
|
|
1260
|
|
1261 The optimizer must replace coro.param(b', b) with `i1 true`, since `b` is used
|
|
1262 after suspend and therefore, it has to reside in the coroutine frame.
|
|
1263
|
|
1264 Coroutine Transformation Passes
|
|
1265 ===============================
|
|
1266 CoroEarly
|
|
1267 ---------
|
|
1268 The pass CoroEarly lowers coroutine intrinsics that hide the details of the
|
|
1269 structure of the coroutine frame, but, otherwise not needed to be preserved to
|
|
1270 help later coroutine passes. This pass lowers `coro.frame`_, `coro.done`_,
|
|
1271 and `coro.promise`_ intrinsics.
|
|
1272
|
|
1273 .. _CoroSplit:
|
|
1274
|
|
1275 CoroSplit
|
|
1276 ---------
|
|
1277 The pass CoroSplit buides coroutine frame and outlines resume and destroy parts
|
|
1278 into separate functions.
|
|
1279
|
|
1280 CoroElide
|
|
1281 ---------
|
|
1282 The pass CoroElide examines if the inlined coroutine is eligible for heap
|
|
1283 allocation elision optimization. If so, it replaces
|
|
1284 `coro.begin` intrinsic with an address of a coroutine frame placed on its caller
|
|
1285 and replaces `coro.alloc` and `coro.free` intrinsics with `false` and `null`
|
|
1286 respectively to remove the deallocation code.
|
|
1287 This pass also replaces `coro.resume` and `coro.destroy` intrinsics with direct
|
|
1288 calls to resume and destroy functions for a particular coroutine where possible.
|
|
1289
|
|
1290 CoroCleanup
|
|
1291 -----------
|
|
1292 This pass runs late to lower all coroutine related intrinsics not replaced by
|
|
1293 earlier passes.
|
|
1294
|
|
1295 Areas Requiring Attention
|
|
1296 =========================
|
|
1297 #. A coroutine frame is bigger than it could be. Adding stack packing and stack
|
|
1298 coloring like optimization on the coroutine frame will result in tighter
|
|
1299 coroutine frames.
|
|
1300
|
|
1301 #. Take advantage of the lifetime intrinsics for the data that goes into the
|
|
1302 coroutine frame. Leave lifetime intrinsics as is for the data that stays in
|
|
1303 allocas.
|
|
1304
|
|
1305 #. The CoroElide optimization pass relies on coroutine ramp function to be
|
|
1306 inlined. It would be beneficial to split the ramp function further to
|
|
1307 increase the chance that it will get inlined into its caller.
|
|
1308
|
|
1309 #. Design a convention that would make it possible to apply coroutine heap
|
|
1310 elision optimization across ABI boundaries.
|
|
1311
|
|
1312 #. Cannot handle coroutines with `inalloca` parameters (used in x86 on Windows).
|
|
1313
|
|
1314 #. Alignment is ignored by coro.begin and coro.free intrinsics.
|
|
1315
|
|
1316 #. Make required changes to make sure that coroutine optimizations work with
|
|
1317 LTO.
|
|
1318
|
|
1319 #. More tests, more tests, more tests
|