83
|
1 =====================================
|
|
2 Garbage Collection Safepoints in LLVM
|
|
3 =====================================
|
|
4
|
|
5 .. contents::
|
|
6 :local:
|
|
7 :depth: 2
|
|
8
|
|
9 Status
|
|
10 =======
|
|
11
|
121
|
12 This document describes a set of extensions to LLVM to support garbage
|
|
13 collection. By now, these mechanisms are well proven with commercial java
|
|
14 implementation with a fully relocating collector having shipped using them.
|
|
15 There are a couple places where bugs might still linger; these are called out
|
|
16 below.
|
83
|
17
|
121
|
18 They are still listed as "experimental" to indicate that no forward or backward
|
|
19 compatibility guarantees are offered across versions. If your use case is such
|
|
20 that you need some form of forward compatibility guarantee, please raise the
|
|
21 issue on the llvm-dev mailing list.
|
|
22
|
|
23 LLVM still supports an alternate mechanism for conservative garbage collection
|
|
24 support using the ``gcroot`` intrinsic. The ``gcroot`` mechanism is mostly of
|
|
25 historical interest at this point with one exception - its implementation of
|
|
26 shadow stacks has been used successfully by a number of language frontends and
|
|
27 is still supported.
|
83
|
28
|
|
29 Overview
|
|
30 ========
|
|
31
|
|
32 To collect dead objects, garbage collectors must be able to identify
|
|
33 any references to objects contained within executing code, and,
|
|
34 depending on the collector, potentially update them. The collector
|
|
35 does not need this information at all points in code - that would make
|
|
36 the problem much harder - but only at well-defined points in the
|
|
37 execution known as 'safepoints' For most collectors, it is sufficient
|
|
38 to track at least one copy of each unique pointer value. However, for
|
|
39 a collector which wishes to relocate objects directly reachable from
|
|
40 running code, a higher standard is required.
|
|
41
|
|
42 One additional challenge is that the compiler may compute intermediate
|
|
43 results ("derived pointers") which point outside of the allocation or
|
|
44 even into the middle of another allocation. The eventual use of this
|
|
45 intermediate value must yield an address within the bounds of the
|
|
46 allocation, but such "exterior derived pointers" may be visible to the
|
|
47 collector. Given this, a garbage collector can not safely rely on the
|
|
48 runtime value of an address to indicate the object it is associated
|
|
49 with. If the garbage collector wishes to move any object, the
|
|
50 compiler must provide a mapping, for each pointer, to an indication of
|
|
51 its allocation.
|
|
52
|
|
53 To simplify the interaction between a collector and the compiled code,
|
|
54 most garbage collectors are organized in terms of three abstractions:
|
|
55 load barriers, store barriers, and safepoints.
|
|
56
|
|
57 #. A load barrier is a bit of code executed immediately after the
|
|
58 machine load instruction, but before any use of the value loaded.
|
|
59 Depending on the collector, such a barrier may be needed for all
|
|
60 loads, merely loads of a particular type (in the original source
|
|
61 language), or none at all.
|
|
62
|
95
|
63 #. Analogously, a store barrier is a code fragment that runs
|
83
|
64 immediately before the machine store instruction, but after the
|
|
65 computation of the value stored. The most common use of a store
|
|
66 barrier is to update a 'card table' in a generational garbage
|
|
67 collector.
|
|
68
|
|
69 #. A safepoint is a location at which pointers visible to the compiled
|
|
70 code (i.e. currently in registers or on the stack) are allowed to
|
|
71 change. After the safepoint completes, the actual pointer value
|
|
72 may differ, but the 'object' (as seen by the source language)
|
|
73 pointed to will not.
|
|
74
|
|
75 Note that the term 'safepoint' is somewhat overloaded. It refers to
|
|
76 both the location at which the machine state is parsable and the
|
|
77 coordination protocol involved in bring application threads to a
|
|
78 point at which the collector can safely use that information. The
|
|
79 term "statepoint" as used in this document refers exclusively to the
|
|
80 former.
|
|
81
|
|
82 This document focuses on the last item - compiler support for
|
|
83 safepoints in generated code. We will assume that an outside
|
|
84 mechanism has decided where to place safepoints. From our
|
|
85 perspective, all safepoints will be function calls. To support
|
|
86 relocation of objects directly reachable from values in compiled code,
|
|
87 the collector must be able to:
|
|
88
|
|
89 #. identify every copy of a pointer (including copies introduced by
|
|
90 the compiler itself) at the safepoint,
|
|
91 #. identify which object each pointer relates to, and
|
|
92 #. potentially update each of those copies.
|
|
93
|
|
94 This document describes the mechanism by which an LLVM based compiler
|
|
95 can provide this information to a language runtime/collector, and
|
121
|
96 ensure that all pointers can be read and updated if desired.
|
|
97
|
|
98 At a high level, LLVM has been extended to support compiling to an abstract
|
|
99 machine which extends the actual target with a non-integral pointer type
|
|
100 suitable for representing a garbage collected reference to an object. In
|
|
101 particular, such non-integral pointer type have no defined mapping to an
|
|
102 integer representation. This semantic quirk allows the runtime to pick a
|
|
103 integer mapping for each point in the program allowing relocations of objects
|
|
104 without visible effects.
|
|
105
|
|
106 Warning: Non-Integral Pointer Types are a newly added concept in LLVM IR.
|
|
107 It's possible that we've missed disabling some of the optimizations which
|
|
108 assume an integral value for pointers. If you find such a case, please
|
|
109 file a bug or share a patch.
|
|
110
|
|
111 Warning: There is one currently known semantic hole in the definition of
|
|
112 non-integral pointers which has not been addressed upstream. To work around
|
|
113 this, you need to disable speculation of loads unless the memory type
|
|
114 (non-integral pointer vs anything else) is known to unchanged. That is, it is
|
|
115 not safe to speculate a load if doing causes a non-integral pointer value to
|
|
116 be loaded as any other type or vice versa. In practice, this restriction is
|
|
117 well isolated to isSafeToSpeculate in ValueTracking.cpp.
|
|
118
|
|
119 This high level abstract machine model is used for most of the LLVM optimizer.
|
|
120 Before starting code generation, we switch representations to an explicit form.
|
|
121 In theory, a frontend could directly generate this low level explicit form, but
|
|
122 doing so is likely to inhibit optimization.
|
|
123
|
|
124 The heart of the explicit approach is to construct (or rewrite) the IR in a
|
|
125 manner where the possible updates performed by the garbage collector are
|
83
|
126 explicitly visible in the IR. Doing so requires that we:
|
|
127
|
|
128 #. create a new SSA value for each potentially relocated pointer, and
|
|
129 ensure that no uses of the original (non relocated) value is
|
|
130 reachable after the safepoint,
|
|
131 #. specify the relocation in a way which is opaque to the compiler to
|
|
132 ensure that the optimizer can not introduce new uses of an
|
|
133 unrelocated value after a statepoint. This prevents the optimizer
|
|
134 from performing unsound optimizations.
|
|
135 #. recording a mapping of live pointers (and the allocation they're
|
|
136 associated with) for each statepoint.
|
|
137
|
|
138 At the most abstract level, inserting a safepoint can be thought of as
|
|
139 replacing a call instruction with a call to a multiple return value
|
|
140 function which both calls the original target of the call, returns
|
121
|
141 its result, and returns updated values for any live pointers to
|
83
|
142 garbage collected objects.
|
|
143
|
|
144 Note that the task of identifying all live pointers to garbage
|
|
145 collected values, transforming the IR to expose a pointer giving the
|
|
146 base object for every such live pointer, and inserting all the
|
|
147 intrinsics correctly is explicitly out of scope for this document.
|
95
|
148 The recommended approach is to use the :ref:`utility passes
|
|
149 <statepoint-utilities>` described below.
|
83
|
150
|
|
151 This abstract function call is concretely represented by a sequence of
|
95
|
152 intrinsic calls known collectively as a "statepoint relocation sequence".
|
83
|
153
|
|
154 Let's consider a simple call in LLVM IR:
|
95
|
155
|
|
156 .. code-block:: llvm
|
83
|
157
|
95
|
158 define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
|
|
159 gc "statepoint-example" {
|
|
160 call void ()* @foo()
|
|
161 ret i8 addrspace(1)* %obj
|
|
162 }
|
|
163
|
|
164 Depending on our language we may need to allow a safepoint during the execution
|
|
165 of ``foo``. If so, we need to let the collector update local values in the
|
|
166 current frame. If we don't, we'll be accessing a potential invalid reference
|
|
167 once we eventually return from the call.
|
83
|
168
|
95
|
169 In this example, we need to relocate the SSA value ``%obj``. Since we can't
|
|
170 actually change the value in the SSA value ``%obj``, we need to introduce a new
|
|
171 SSA value ``%obj.relocated`` which represents the potentially changed value of
|
|
172 ``%obj`` after the safepoint and update any following uses appropriately. The
|
|
173 resulting relocation sequence is:
|
83
|
174
|
121
|
175 .. code-block:: llvm
|
95
|
176
|
|
177 define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
|
|
178 gc "statepoint-example" {
|
100
|
179 %0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
|
|
180 %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 7, i32 7)
|
95
|
181 ret i8 addrspace(1)* %obj.relocated
|
|
182 }
|
83
|
183
|
|
184 Ideally, this sequence would have been represented as a M argument, N
|
|
185 return value function (where M is the number of values being
|
|
186 relocated + the original call arguments and N is the original return
|
|
187 value + each relocated value), but LLVM does not easily support such a
|
|
188 representation.
|
|
189
|
|
190 Instead, the statepoint intrinsic marks the actual site of the
|
|
191 safepoint or statepoint. The statepoint returns a token value (which
|
|
192 exists only at compile time). To get back the original return value
|
95
|
193 of the call, we use the ``gc.result`` intrinsic. To get the relocation
|
|
194 of each pointer in turn, we use the ``gc.relocate`` intrinsic with the
|
|
195 appropriate index. Note that both the ``gc.relocate`` and ``gc.result`` are
|
|
196 tied to the statepoint. The combination forms a "statepoint relocation
|
|
197 sequence" and represents the entirety of a parseable call or 'statepoint'.
|
|
198
|
|
199 When lowered, this example would generate the following x86 assembly:
|
83
|
200
|
95
|
201 .. code-block:: gas
|
|
202
|
|
203 .globl test1
|
|
204 .align 16, 0x90
|
|
205 pushq %rax
|
|
206 callq foo
|
|
207 .Ltmp1:
|
|
208 movq (%rsp), %rax # This load is redundant (oops!)
|
|
209 popq %rdx
|
|
210 retq
|
83
|
211
|
|
212 Each of the potentially relocated values has been spilled to the
|
|
213 stack, and a record of that location has been recorded to the
|
|
214 :ref:`Stack Map section <stackmap-section>`. If the garbage collector
|
|
215 needs to update any of these pointers during the call, it knows
|
|
216 exactly what to change.
|
|
217
|
95
|
218 The relevant parts of the StackMap section for our example are:
|
|
219
|
|
220 .. code-block:: gas
|
|
221
|
|
222 # This describes the call site
|
|
223 # Stack Maps: callsite 2882400000
|
|
224 .quad 2882400000
|
|
225 .long .Ltmp1-test1
|
|
226 .short 0
|
|
227 # .. 8 entries skipped ..
|
|
228 # This entry describes the spill slot which is directly addressable
|
|
229 # off RSP with offset 0. Given the value was spilled with a pushq,
|
|
230 # that makes sense.
|
|
231 # Stack Maps: Loc 8: Direct RSP [encoding: .byte 2, .byte 8, .short 7, .int 0]
|
|
232 .byte 2
|
|
233 .byte 8
|
|
234 .short 7
|
|
235 .long 0
|
|
236
|
121
|
237 This example was taken from the tests for the :ref:`RewriteStatepointsForGC`
|
|
238 utility pass. As such, its full StackMap can be easily examined with the
|
|
239 following command.
|
95
|
240
|
|
241 .. code-block:: bash
|
|
242
|
|
243 opt -rewrite-statepoints-for-gc test/Transforms/RewriteStatepointsForGC/basics.ll -S | llc -debug-only=stackmaps
|
|
244
|
|
245 Base & Derived Pointers
|
|
246 ^^^^^^^^^^^^^^^^^^^^^^^
|
|
247
|
|
248 A "base pointer" is one which points to the starting address of an allocation
|
|
249 (object). A "derived pointer" is one which is offset from a base pointer by
|
|
250 some amount. When relocating objects, a garbage collector needs to be able
|
|
251 to relocate each derived pointer associated with an allocation to the same
|
|
252 offset from the new address.
|
|
253
|
|
254 "Interior derived pointers" remain within the bounds of the allocation
|
|
255 they're associated with. As a result, the base object can be found at
|
|
256 runtime provided the bounds of allocations are known to the runtime system.
|
|
257
|
|
258 "Exterior derived pointers" are outside the bounds of the associated object;
|
|
259 they may even fall within *another* allocations address range. As a result,
|
|
260 there is no way for a garbage collector to determine which allocation they
|
|
261 are associated with at runtime and compiler support is needed.
|
|
262
|
|
263 The ``gc.relocate`` intrinsic supports an explicit operand for describing the
|
|
264 allocation associated with a derived pointer. This operand is frequently
|
|
265 referred to as the base operand, but does not strictly speaking have to be
|
|
266 a base pointer, but it does need to lie within the bounds of the associated
|
|
267 allocation. Some collectors may require that the operand be an actual base
|
|
268 pointer rather than merely an internal derived pointer. Note that during
|
|
269 lowering both the base and derived pointer operands are required to be live
|
|
270 over the associated call safepoint even if the base is otherwise unused
|
|
271 afterwards.
|
|
272
|
|
273 If we extend our previous example to include a pointless derived pointer,
|
|
274 we get:
|
|
275
|
121
|
276 .. code-block:: llvm
|
95
|
277
|
|
278 define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
|
|
279 gc "statepoint-example" {
|
|
280 %gep = getelementptr i8, i8 addrspace(1)* %obj, i64 20000
|
100
|
281 %token = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj, i8 addrspace(1)* %gep)
|
|
282 %obj.relocated = call i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %token, i32 7, i32 7)
|
|
283 %gep.relocated = call i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %token, i32 7, i32 8)
|
95
|
284 %p = getelementptr i8, i8 addrspace(1)* %gep, i64 -20000
|
|
285 ret i8 addrspace(1)* %p
|
|
286 }
|
|
287
|
|
288 Note that in this example %p and %obj.relocate are the same address and we
|
|
289 could replace one with the other, potentially removing the derived pointer
|
100
|
290 from the live set at the safepoint entirely.
|
|
291
|
|
292 .. _gc_transition_args:
|
95
|
293
|
|
294 GC Transitions
|
|
295 ^^^^^^^^^^^^^^^^^^
|
|
296
|
|
297 As a practical consideration, many garbage-collected systems allow code that is
|
|
298 collector-aware ("managed code") to call code that is not collector-aware
|
|
299 ("unmanaged code"). It is common that such calls must also be safepoints, since
|
|
300 it is desirable to allow the collector to run during the execution of
|
120
|
301 unmanaged code. Furthermore, it is common that coordinating the transition from
|
95
|
302 managed to unmanaged code requires extra code generation at the call site to
|
|
303 inform the collector of the transition. In order to support these needs, a
|
|
304 statepoint may be marked as a GC transition, and data that is necessary to
|
|
305 perform the transition (if any) may be provided as additional arguments to the
|
|
306 statepoint.
|
|
307
|
|
308 Note that although in many cases statepoints may be inferred to be GC
|
|
309 transitions based on the function symbols involved (e.g. a call from a
|
|
310 function with GC strategy "foo" to a function with GC strategy "bar"),
|
|
311 indirect calls that are also GC transitions must also be supported. This
|
|
312 requirement is the driving force behind the decision to require that GC
|
|
313 transitions are explicitly marked.
|
|
314
|
|
315 Let's revisit the sample given above, this time treating the call to ``@foo``
|
|
316 as a GC transition. Depending on our target, the transition code may need to
|
|
317 access some extra state in order to inform the collector of the transition.
|
|
318 Let's assume a hypothetical GC--somewhat unimaginatively named "hypothetical-gc"
|
|
319 --that requires that a TLS variable must be written to before and after a call
|
|
320 to unmanaged code. The resulting relocation sequence is:
|
|
321
|
121
|
322 .. code-block:: llvm
|
95
|
323
|
|
324 @flag = thread_local global i32 0, align 4
|
|
325
|
|
326 define i8 addrspace(1)* @test1(i8 addrspace(1) *%obj)
|
|
327 gc "hypothetical-gc" {
|
|
328
|
100
|
329 %0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* @foo, i32 0, i32 1, i32* @Flag, i32 0, i8 addrspace(1)* %obj)
|
|
330 %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 7, i32 7)
|
95
|
331 ret i8 addrspace(1)* %obj.relocated
|
|
332 }
|
|
333
|
|
334 During lowering, this will result in a instruction selection DAG that looks
|
|
335 something like:
|
|
336
|
|
337 ::
|
|
338
|
|
339 CALLSEQ_START
|
|
340 ...
|
|
341 GC_TRANSITION_START (lowered i32 *@Flag), SRCVALUE i32* Flag
|
|
342 STATEPOINT
|
|
343 GC_TRANSITION_END (lowered i32 *@Flag), SRCVALUE i32 *Flag
|
|
344 ...
|
|
345 CALLSEQ_END
|
|
346
|
|
347 In order to generate the necessary transition code, the backend for each target
|
|
348 supported by "hypothetical-gc" must be modified to lower ``GC_TRANSITION_START``
|
|
349 and ``GC_TRANSITION_END`` nodes appropriately when the "hypothetical-gc"
|
|
350 strategy is in use for a particular function. Assuming that such lowering has
|
|
351 been added for X86, the generated assembly would be:
|
|
352
|
|
353 .. code-block:: gas
|
|
354
|
|
355 .globl test1
|
|
356 .align 16, 0x90
|
|
357 pushq %rax
|
|
358 movl $1, %fs:Flag@TPOFF
|
|
359 callq foo
|
|
360 movl $0, %fs:Flag@TPOFF
|
|
361 .Ltmp1:
|
|
362 movq (%rsp), %rax # This load is redundant (oops!)
|
|
363 popq %rdx
|
|
364 retq
|
|
365
|
|
366 Note that the design as presented above is not fully implemented: in particular,
|
|
367 strategy-specific lowering is not present, and all GC transitions are emitted as
|
|
368 as single no-op before and after the call instruction. These no-ops are often
|
|
369 removed by the backend during dead machine instruction elimination.
|
|
370
|
|
371
|
83
|
372 Intrinsics
|
|
373 ===========
|
|
374
|
95
|
375 'llvm.experimental.gc.statepoint' Intrinsic
|
|
376 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
83
|
377
|
|
378 Syntax:
|
|
379 """""""
|
|
380
|
|
381 ::
|
|
382
|
100
|
383 declare token
|
95
|
384 @llvm.experimental.gc.statepoint(i64 <id>, i32 <num patch bytes>,
|
|
385 func_type <target>,
|
|
386 i64 <#call args>, i64 <flags>,
|
|
387 ... (call parameters),
|
|
388 i64 <# transition args>, ... (transition parameters),
|
83
|
389 i64 <# deopt args>, ... (deopt parameters),
|
|
390 ... (gc parameters))
|
|
391
|
|
392 Overview:
|
|
393 """""""""
|
|
394
|
|
395 The statepoint intrinsic represents a call which is parse-able by the
|
|
396 runtime.
|
|
397
|
|
398 Operands:
|
|
399 """""""""
|
|
400
|
95
|
401 The 'id' operand is a constant integer that is reported as the ID
|
|
402 field in the generated stackmap. LLVM does not interpret this
|
|
403 parameter in any way and its meaning is up to the statepoint user to
|
|
404 decide. Note that LLVM is free to duplicate code containing
|
|
405 statepoint calls, and this may transform IR that had a unique 'id' per
|
|
406 lexical call to statepoint to IR that does not.
|
|
407
|
|
408 If 'num patch bytes' is non-zero then the call instruction
|
|
409 corresponding to the statepoint is not emitted and LLVM emits 'num
|
|
410 patch bytes' bytes of nops in its place. LLVM will emit code to
|
|
411 prepare the function arguments and retrieve the function return value
|
|
412 in accordance to the calling convention; the former before the nop
|
|
413 sequence and the latter after the nop sequence. It is expected that
|
|
414 the user will patch over the 'num patch bytes' bytes of nops with a
|
|
415 calling sequence specific to their runtime before executing the
|
|
416 generated machine code. There are no guarantees with respect to the
|
|
417 alignment of the nop sequence. Unlike :doc:`StackMaps` statepoints do
|
|
418 not have a concept of shadow bytes. Note that semantically the
|
|
419 statepoint still represents a call or invoke to 'target', and the nop
|
|
420 sequence after patching is expected to represent an operation
|
|
421 equivalent to a call or invoke to 'target'.
|
|
422
|
83
|
423 The 'target' operand is the function actually being called. The
|
|
424 target can be specified as either a symbolic LLVM function, or as an
|
|
425 arbitrary Value of appropriate function type. Note that the function
|
|
426 type must match the signature of the callee and the types of the 'call
|
|
427 parameters' arguments.
|
|
428
|
|
429 The '#call args' operand is the number of arguments to the actual
|
|
430 call. It must exactly match the number of arguments passed in the
|
|
431 'call parameters' variable length section.
|
|
432
|
95
|
433 The 'flags' operand is used to specify extra information about the
|
|
434 statepoint. This is currently only used to mark certain statepoints
|
|
435 as GC transitions. This operand is a 64-bit integer with the following
|
|
436 layout, where bit 0 is the least significant bit:
|
|
437
|
|
438 +-------+---------------------------------------------------+
|
|
439 | Bit # | Usage |
|
|
440 +=======+===================================================+
|
|
441 | 0 | Set if the statepoint is a GC transition, cleared |
|
|
442 | | otherwise. |
|
|
443 +-------+---------------------------------------------------+
|
|
444 | 1-63 | Reserved for future use; must be cleared. |
|
|
445 +-------+---------------------------------------------------+
|
83
|
446
|
|
447 The 'call parameters' arguments are simply the arguments which need to
|
|
448 be passed to the call target. They will be lowered according to the
|
|
449 specified calling convention and otherwise handled like a normal call
|
|
450 instruction. The number of arguments must exactly match what is
|
|
451 specified in '# call args'. The types must match the signature of
|
|
452 'target'.
|
|
453
|
95
|
454 The 'transition parameters' arguments contain an arbitrary list of
|
|
455 Values which need to be passed to GC transition code. They will be
|
|
456 lowered and passed as operands to the appropriate GC_TRANSITION nodes
|
|
457 in the selection DAG. It is assumed that these arguments must be
|
|
458 available before and after (but not necessarily during) the execution
|
|
459 of the callee. The '# transition args' field indicates how many operands
|
|
460 are to be interpreted as 'transition parameters'.
|
|
461
|
83
|
462 The 'deopt parameters' arguments contain an arbitrary list of Values
|
|
463 which is meaningful to the runtime. The runtime may read any of these
|
|
464 values, but is assumed not to modify them. If the garbage collector
|
|
465 might need to modify one of these values, it must also be listed in
|
|
466 the 'gc pointer' argument list. The '# deopt args' field indicates
|
|
467 how many operands are to be interpreted as 'deopt parameters'.
|
|
468
|
|
469 The 'gc parameters' arguments contain every pointer to a garbage
|
|
470 collector object which potentially needs to be updated by the garbage
|
|
471 collector. Note that the argument list must explicitly contain a base
|
|
472 pointer for every derived pointer listed. The order of arguments is
|
|
473 unimportant. Unlike the other variable length parameter sets, this
|
|
474 list is not length prefixed.
|
|
475
|
|
476 Semantics:
|
|
477 """"""""""
|
|
478
|
|
479 A statepoint is assumed to read and write all memory. As a result,
|
|
480 memory operations can not be reordered past a statepoint. It is
|
|
481 illegal to mark a statepoint as being either 'readonly' or 'readnone'.
|
|
482
|
|
483 Note that legal IR can not perform any memory operation on a 'gc
|
|
484 pointer' argument of the statepoint in a location statically reachable
|
|
485 from the statepoint. Instead, the explicitly relocated value (from a
|
95
|
486 ``gc.relocate``) must be used.
|
83
|
487
|
95
|
488 'llvm.experimental.gc.result' Intrinsic
|
|
489 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
83
|
490
|
|
491 Syntax:
|
|
492 """""""
|
|
493
|
|
494 ::
|
|
495
|
|
496 declare type*
|
100
|
497 @llvm.experimental.gc.result(token %statepoint_token)
|
83
|
498
|
|
499 Overview:
|
|
500 """""""""
|
|
501
|
95
|
502 ``gc.result`` extracts the result of the original call instruction
|
|
503 which was replaced by the ``gc.statepoint``. The ``gc.result``
|
83
|
504 intrinsic is actually a family of three intrinsics due to an
|
|
505 implementation limitation. Other than the type of the return value,
|
|
506 the semantics are the same.
|
|
507
|
|
508 Operands:
|
|
509 """""""""
|
|
510
|
95
|
511 The first and only argument is the ``gc.statepoint`` which starts
|
|
512 the safepoint sequence of which this ``gc.result`` is a part.
|
100
|
513 Despite the typing of this as a generic token, *only* the value defined
|
95
|
514 by a ``gc.statepoint`` is legal here.
|
83
|
515
|
|
516 Semantics:
|
|
517 """"""""""
|
|
518
|
95
|
519 The ``gc.result`` represents the return value of the call target of
|
|
520 the ``statepoint``. The type of the ``gc.result`` must exactly match
|
83
|
521 the type of the target. If the call target returns void, there will
|
95
|
522 be no ``gc.result``.
|
83
|
523
|
95
|
524 A ``gc.result`` is modeled as a 'readnone' pure function. It has no
|
83
|
525 side effects since it is just a projection of the return value of the
|
95
|
526 previous call represented by the ``gc.statepoint``.
|
83
|
527
|
95
|
528 'llvm.experimental.gc.relocate' Intrinsic
|
|
529 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
83
|
530
|
|
531 Syntax:
|
|
532 """""""
|
|
533
|
|
534 ::
|
|
535
|
95
|
536 declare <pointer type>
|
100
|
537 @llvm.experimental.gc.relocate(token %statepoint_token,
|
95
|
538 i32 %base_offset,
|
|
539 i32 %pointer_offset)
|
83
|
540
|
|
541 Overview:
|
|
542 """""""""
|
|
543
|
95
|
544 A ``gc.relocate`` returns the potentially relocated value of a pointer
|
83
|
545 at the safepoint.
|
|
546
|
|
547 Operands:
|
|
548 """""""""
|
|
549
|
95
|
550 The first argument is the ``gc.statepoint`` which starts the
|
|
551 safepoint sequence of which this ``gc.relocation`` is a part.
|
100
|
552 Despite the typing of this as a generic token, *only* the value defined
|
95
|
553 by a ``gc.statepoint`` is legal here.
|
83
|
554
|
|
555 The second argument is an index into the statepoints list of arguments
|
95
|
556 which specifies the allocation for the pointer being relocated.
|
83
|
557 This index must land within the 'gc parameter' section of the
|
95
|
558 statepoint's argument list. The associated value must be within the
|
|
559 object with which the pointer being relocated is associated. The optimizer
|
|
560 is free to change *which* interior derived pointer is reported, provided that
|
|
561 it does not replace an actual base pointer with another interior derived
|
|
562 pointer. Collectors are allowed to rely on the base pointer operand
|
|
563 remaining an actual base pointer if so constructed.
|
83
|
564
|
|
565 The third argument is an index into the statepoint's list of arguments
|
|
566 which specify the (potentially) derived pointer being relocated. It
|
|
567 is legal for this index to be the same as the second argument
|
|
568 if-and-only-if a base pointer is being relocated. This index must land
|
|
569 within the 'gc parameter' section of the statepoint's argument list.
|
|
570
|
|
571 Semantics:
|
|
572 """"""""""
|
|
573
|
95
|
574 The return value of ``gc.relocate`` is the potentially relocated value
|
121
|
575 of the pointer specified by its arguments. It is unspecified how the
|
83
|
576 value of the returned pointer relates to the argument to the
|
95
|
577 ``gc.statepoint`` other than that a) it points to the same source
|
83
|
578 language object with the same offset, and b) the 'based-on'
|
|
579 relationship of the newly relocated pointers is a projection of the
|
|
580 unrelocated pointers. In particular, the integer value of the pointer
|
|
581 returned is unspecified.
|
|
582
|
95
|
583 A ``gc.relocate`` is modeled as a ``readnone`` pure function. It has no
|
83
|
584 side effects since it is just a way to extract information about work
|
95
|
585 done during the actual call modeled by the ``gc.statepoint``.
|
83
|
586
|
95
|
587 .. _statepoint-stackmap-format:
|
83
|
588
|
|
589 Stack Map Format
|
|
590 ================
|
|
591
|
|
592 Locations for each pointer value which may need read and/or updated by
|
|
593 the runtime or collector are provided via the :ref:`Stack Map format
|
|
594 <stackmap-format>` specified in the PatchPoint documentation.
|
|
595
|
|
596 Each statepoint generates the following Locations:
|
|
597
|
95
|
598 * Constant which describes the calling convention of the call target. This
|
|
599 constant is a valid :ref:`calling convention identifier <callingconv>` for
|
|
600 the version of LLVM used to generate the stackmap. No additional compatibility
|
|
601 guarantees are made for this constant over what LLVM provides elsewhere w.r.t.
|
|
602 these identifiers.
|
|
603 * Constant which describes the flags passed to the statepoint intrinsic
|
83
|
604 * Constant which describes number of following deopt *Locations* (not
|
|
605 operands)
|
|
606 * Variable number of Locations, one for each deopt parameter listed in
|
100
|
607 the IR statepoint (same number as described by previous Constant). At
|
|
608 the moment, only deopt parameters with a bitwidth of 64 bits or less
|
|
609 are supported. Values of a type larger than 64 bits can be specified
|
|
610 and reported only if a) the value is constant at the call site, and b)
|
|
611 the constant can be represented with less than 64 bits (assuming zero
|
|
612 extension to the original bitwidth).
|
|
613 * Variable number of relocation records, each of which consists of
|
|
614 exactly two Locations. Relocation records are described in detail
|
|
615 below.
|
|
616
|
|
617 Each relocation record provides sufficient information for a collector to
|
|
618 relocate one or more derived pointers. Each record consists of a pair of
|
|
619 Locations. The second element in the record represents the pointer (or
|
|
620 pointers) which need updated. The first element in the record provides a
|
|
621 pointer to the base of the object with which the pointer(s) being relocated is
|
|
622 associated. This information is required for handling generalized derived
|
|
623 pointers since a pointer may be outside the bounds of the original allocation,
|
|
624 but still needs to be relocated with the allocation. Additionally:
|
|
625
|
|
626 * It is guaranteed that the base pointer must also appear explicitly as a
|
|
627 relocation pair if used after the statepoint.
|
|
628 * There may be fewer relocation records then gc parameters in the IR
|
83
|
629 statepoint. Each *unique* pair will occur at least once; duplicates
|
100
|
630 are possible.
|
|
631 * The Locations within each record may either be of pointer size or a
|
|
632 multiple of pointer size. In the later case, the record must be
|
|
633 interpreted as describing a sequence of pointers and their corresponding
|
|
634 base pointers. If the Location is of size N x sizeof(pointer), then
|
|
635 there will be N records of one pointer each contained within the Location.
|
|
636 Both Locations in a pair can be assumed to be of the same size.
|
83
|
637
|
|
638 Note that the Locations used in each section may describe the same
|
|
639 physical location. e.g. A stack slot may appear as a deopt location,
|
|
640 a gc base pointer, and a gc derived pointer.
|
|
641
|
|
642 The LiveOut section of the StkMapRecord will be empty for a statepoint
|
|
643 record.
|
|
644
|
|
645 Safepoint Semantics & Verification
|
|
646 ==================================
|
|
647
|
|
648 The fundamental correctness property for the compiled code's
|
|
649 correctness w.r.t. the garbage collector is a dynamic one. It must be
|
|
650 the case that there is no dynamic trace such that a operation
|
|
651 involving a potentially relocated pointer is observably-after a
|
|
652 safepoint which could relocate it. 'observably-after' is this usage
|
|
653 means that an outside observer could observe this sequence of events
|
|
654 in a way which precludes the operation being performed before the
|
|
655 safepoint.
|
|
656
|
|
657 To understand why this 'observable-after' property is required,
|
|
658 consider a null comparison performed on the original copy of a
|
|
659 relocated pointer. Assuming that control flow follows the safepoint,
|
|
660 there is no way to observe externally whether the null comparison is
|
|
661 performed before or after the safepoint. (Remember, the original
|
|
662 Value is unmodified by the safepoint.) The compiler is free to make
|
|
663 either scheduling choice.
|
|
664
|
|
665 The actual correctness property implemented is slightly stronger than
|
|
666 this. We require that there be no *static path* on which a
|
|
667 potentially relocated pointer is 'observably-after' it may have been
|
|
668 relocated. This is slightly stronger than is strictly necessary (and
|
|
669 thus may disallow some otherwise valid programs), but greatly
|
|
670 simplifies reasoning about correctness of the compiled code.
|
|
671
|
|
672 By construction, this property will be upheld by the optimizer if
|
|
673 correctly established in the source IR. This is a key invariant of
|
|
674 the design.
|
|
675
|
|
676 The existing IR Verifier pass has been extended to check most of the
|
|
677 local restrictions on the intrinsics mentioned in their respective
|
|
678 documentation. The current implementation in LLVM does not check the
|
|
679 key relocation invariant, but this is ongoing work on developing such
|
95
|
680 a verifier. Please ask on llvm-dev if you're interested in
|
83
|
681 experimenting with the current version.
|
|
682
|
95
|
683 .. _statepoint-utilities:
|
|
684
|
|
685 Utility Passes for Safepoint Insertion
|
|
686 ======================================
|
|
687
|
|
688 .. _RewriteStatepointsForGC:
|
|
689
|
|
690 RewriteStatepointsForGC
|
|
691 ^^^^^^^^^^^^^^^^^^^^^^^^
|
|
692
|
121
|
693 The pass RewriteStatepointsForGC transforms a function's IR to lower from the
|
|
694 abstract machine model described above to the explicit statepoint model of
|
|
695 relocations. To do this, it replaces all calls or invokes of functions which
|
|
696 might contain a safepoint poll with a ``gc.statepoint`` and associated full
|
|
697 relocation sequence, including all required ``gc.relocates``.
|
|
698
|
|
699 Note that by default, this pass only runs for the "statepoint-example" or
|
|
700 "core-clr" gc strategies. You will need to add your custom strategy to this
|
|
701 whitelist or use one of the predefined ones.
|
95
|
702
|
|
703 As an example, given this code:
|
|
704
|
121
|
705 .. code-block:: llvm
|
95
|
706
|
|
707 define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
|
|
708 gc "statepoint-example" {
|
121
|
709 call void @foo()
|
95
|
710 ret i8 addrspace(1)* %obj
|
|
711 }
|
|
712
|
|
713 The pass would produce this IR:
|
|
714
|
121
|
715 .. code-block:: llvm
|
95
|
716
|
|
717 define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj)
|
|
718 gc "statepoint-example" {
|
100
|
719 %0 = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @foo, i32 0, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i8 addrspace(1)* %obj)
|
|
720 %obj.relocated = call coldcc i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %0, i32 12, i32 12)
|
95
|
721 ret i8 addrspace(1)* %obj.relocated
|
|
722 }
|
|
723
|
|
724 In the above examples, the addrspace(1) marker on the pointers is the mechanism
|
|
725 that the ``statepoint-example`` GC strategy uses to distinguish references from
|
121
|
726 non references. The pass assumes that all addrspace(1) pointers are non-integral
|
|
727 pointer types. Address space 1 is not globally reserved for this purpose.
|
95
|
728
|
|
729 This pass can be used an utility function by a language frontend that doesn't
|
|
730 want to manually reason about liveness, base pointers, or relocation when
|
|
731 constructing IR. As currently implemented, RewriteStatepointsForGC must be
|
|
732 run after SSA construction (i.e. mem2ref).
|
|
733
|
|
734 RewriteStatepointsForGC will ensure that appropriate base pointers are listed
|
|
735 for every relocation created. It will do so by duplicating code as needed to
|
|
736 propagate the base pointer associated with each pointer being relocated to
|
|
737 the appropriate safepoints. The implementation assumes that the following
|
|
738 IR constructs produce base pointers: loads from the heap, addresses of global
|
|
739 variables, function arguments, function return values. Constant pointers (such
|
|
740 as null) are also assumed to be base pointers. In practice, this constraint
|
|
741 can be relaxed to producing interior derived pointers provided the target
|
|
742 collector can find the associated allocation from an arbitrary interior
|
|
743 derived pointer.
|
|
744
|
121
|
745 By default RewriteStatepointsForGC passes in ``0xABCDEF00`` as the statepoint
|
|
746 ID and ``0`` as the number of patchable bytes to the newly constructed
|
|
747 ``gc.statepoint``. These values can be configured on a per-callsite
|
|
748 basis using the attributes ``"statepoint-id"`` and
|
|
749 ``"statepoint-num-patch-bytes"``. If a call site is marked with a
|
|
750 ``"statepoint-id"`` function attribute and its value is a positive
|
|
751 integer (represented as a string), then that value is used as the ID
|
|
752 of the newly constructed ``gc.statepoint``. If a call site is marked
|
|
753 with a ``"statepoint-num-patch-bytes"`` function attribute and its
|
|
754 value is a positive integer, then that value is used as the 'num patch
|
|
755 bytes' parameter of the newly constructed ``gc.statepoint``. The
|
|
756 ``"statepoint-id"`` and ``"statepoint-num-patch-bytes"`` attributes
|
|
757 are not propagated to the ``gc.statepoint`` call or invoke if they
|
|
758 could be successfully parsed.
|
|
759
|
|
760 In practice, RewriteStatepointsForGC should be run much later in the pass
|
95
|
761 pipeline, after most optimization is already done. This helps to improve
|
|
762 the quality of the generated code when compiled with garbage collection support.
|
|
763
|
|
764 .. _PlaceSafepoints:
|
|
765
|
|
766 PlaceSafepoints
|
|
767 ^^^^^^^^^^^^^^^^
|
|
768
|
121
|
769 The pass PlaceSafepoints inserts safepoint polls sufficient to ensure running
|
|
770 code checks for a safepoint request on a timely manner. This pass is expected
|
|
771 to be run before RewriteStatepointsForGC and thus does not produce full
|
|
772 relocation sequences.
|
95
|
773
|
|
774 As an example, given input IR of the following:
|
|
775
|
|
776 .. code-block:: llvm
|
|
777
|
|
778 define void @test() gc "statepoint-example" {
|
|
779 call void @foo()
|
|
780 ret void
|
|
781 }
|
|
782
|
|
783 declare void @do_safepoint()
|
|
784 define void @gc.safepoint_poll() {
|
|
785 call void @do_safepoint()
|
|
786 ret void
|
|
787 }
|
|
788
|
|
789
|
|
790 This pass would produce the following IR:
|
|
791
|
121
|
792 .. code-block:: llvm
|
95
|
793
|
|
794 define void @test() gc "statepoint-example" {
|
121
|
795 call void @do_safepoint()
|
|
796 call void @foo()
|
95
|
797 ret void
|
|
798 }
|
|
799
|
121
|
800 In this case, we've added an (unconditional) entry safepoint poll. Note that
|
|
801 despite appearances, the entry poll is not necessarily redundant. We'd have to
|
|
802 know that ``foo`` and ``test`` were not mutually recursive for the poll to be
|
|
803 redundant. In practice, you'd probably want to your poll definition to contain
|
|
804 a conditional branch of some form.
|
95
|
805
|
|
806 At the moment, PlaceSafepoints can insert safepoint polls at method entry and
|
|
807 loop backedges locations. Extending this to work with return polls would be
|
|
808 straight forward if desired.
|
|
809
|
|
810 PlaceSafepoints includes a number of optimizations to avoid placing safepoint
|
|
811 polls at particular sites unless needed to ensure timely execution of a poll
|
|
812 under normal conditions. PlaceSafepoints does not attempt to ensure timely
|
|
813 execution of a poll under worst case conditions such as heavy system paging.
|
|
814
|
|
815 The implementation of a safepoint poll action is specified by looking up a
|
|
816 function of the name ``gc.safepoint_poll`` in the containing Module. The body
|
|
817 of this function is inserted at each poll site desired. While calls or invokes
|
|
818 inside this method are transformed to a ``gc.statepoints``, recursive poll
|
|
819 insertion is not performed.
|
|
820
|
121
|
821 This pass is useful for any language frontend which only has to support
|
|
822 garbage collection semantics at safepoints. If you need other abstract
|
|
823 frame information at safepoints (e.g. for deoptimization or introspection),
|
|
824 you can insert safepoint polls in the frontend. If you have the later case,
|
|
825 please ask on llvm-dev for suggestions. There's been a good amount of work
|
|
826 done on making such a scheme work well in practice which is not yet documented
|
|
827 here.
|
95
|
828
|
|
829
|
|
830 Supported Architectures
|
|
831 =======================
|
|
832
|
|
833 Support for statepoint generation requires some code for each backend.
|
|
834 Today, only X86_64 is supported.
|
|
835
|
120
|
836 Problem Areas and Active Work
|
|
837 =============================
|
|
838
|
|
839 #. Support for languages which allow unmanaged pointers to garbage collected
|
|
840 objects (i.e. pass a pointer to an object to a C routine) via pinning.
|
|
841
|
|
842 #. Support for garbage collected objects allocated on the stack. Specifically,
|
|
843 allocas are always assumed to be in address space 0 and we need a
|
|
844 cast/promotion operator to let rewriting identify them.
|
|
845
|
|
846 #. The current statepoint lowering is known to be somewhat poor. In the very
|
|
847 long term, we'd like to integrate statepoints with the register allocator;
|
|
848 in the near term this is unlikely to happen. We've found the quality of
|
|
849 lowering to be relatively unimportant as hot-statepoints are almost always
|
|
850 inliner bugs.
|
|
851
|
|
852 #. Concerns have been raised that the statepoint representation results in a
|
|
853 large amount of IR being produced for some examples and that this
|
|
854 contributes to higher than expected memory usage and compile times. There's
|
|
855 no immediate plans to make changes due to this, but alternate models may be
|
|
856 explored in the future.
|
|
857
|
|
858 #. Relocations along exceptional paths are currently broken in ToT. In
|
|
859 particular, there is current no way to represent a rethrow on a path which
|
|
860 also has relocations. See `this llvm-dev discussion
|
|
861 <https://groups.google.com/forum/#!topic/llvm-dev/AE417XjgxvI>`_ for more
|
|
862 detail.
|
|
863
|
83
|
864 Bugs and Enhancements
|
|
865 =====================
|
|
866
|
|
867 Currently known bugs and enhancements under consideration can be
|
|
868 tracked by performing a `bugzilla search
|
121
|
869 <https://bugs.llvm.org/buglist.cgi?cmdtype=runnamed&namedcmd=Statepoint%20Bugs&list_id=64342>`_
|
83
|
870 for [Statepoint] in the summary field. When filing new bugs, please
|
|
871 use this tag so that interested parties see the newly filed bug. As
|
95
|
872 with most LLVM features, design discussions take place on `llvm-dev
|
|
873 <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_, and patches
|
83
|
874 should be sent to `llvm-commits
|
95
|
875 <http://lists.llvm.org/mailman/listinfo/llvm-commits>`_ for review.
|
83
|
876
|