83
|
1 =====================================
|
|
2 Garbage Collection Safepoints in LLVM
|
|
3 =====================================
|
|
4
|
|
5 .. contents::
|
|
6 :local:
|
|
7 :depth: 2
|
|
8
|
|
9 Status
|
|
10 =======
|
|
11
|
|
12 This document describes a set of experimental extensions to LLVM. Use
|
|
13 with caution. Because the intrinsics have experimental status,
|
|
14 compatibility across LLVM releases is not guaranteed.
|
|
15
|
|
16 LLVM currently supports an alternate mechanism for conservative
|
|
17 garbage collection support using the gc_root intrinsic. The mechanism
|
|
18 described here shares little in common with the alternate
|
|
19 implementation and it is hoped that this mechanism will eventually
|
|
20 replace the gc_root mechanism.
|
|
21
|
|
22 Overview
|
|
23 ========
|
|
24
|
|
25 To collect dead objects, garbage collectors must be able to identify
|
|
26 any references to objects contained within executing code, and,
|
|
27 depending on the collector, potentially update them. The collector
|
|
28 does not need this information at all points in code - that would make
|
|
29 the problem much harder - but only at well-defined points in the
|
|
30 execution known as 'safepoints' For most collectors, it is sufficient
|
|
31 to track at least one copy of each unique pointer value. However, for
|
|
32 a collector which wishes to relocate objects directly reachable from
|
|
33 running code, a higher standard is required.
|
|
34
|
|
35 One additional challenge is that the compiler may compute intermediate
|
|
36 results ("derived pointers") which point outside of the allocation or
|
|
37 even into the middle of another allocation. The eventual use of this
|
|
38 intermediate value must yield an address within the bounds of the
|
|
39 allocation, but such "exterior derived pointers" may be visible to the
|
|
40 collector. Given this, a garbage collector can not safely rely on the
|
|
41 runtime value of an address to indicate the object it is associated
|
|
42 with. If the garbage collector wishes to move any object, the
|
|
43 compiler must provide a mapping, for each pointer, to an indication of
|
|
44 its allocation.
|
|
45
|
|
46 To simplify the interaction between a collector and the compiled code,
|
|
47 most garbage collectors are organized in terms of three abstractions:
|
|
48 load barriers, store barriers, and safepoints.
|
|
49
|
|
50 #. A load barrier is a bit of code executed immediately after the
|
|
51 machine load instruction, but before any use of the value loaded.
|
|
52 Depending on the collector, such a barrier may be needed for all
|
|
53 loads, merely loads of a particular type (in the original source
|
|
54 language), or none at all.
|
|
55
|
|
56 #. Analogously, a store barrier is a code fragement that runs
|
|
57 immediately before the machine store instruction, but after the
|
|
58 computation of the value stored. The most common use of a store
|
|
59 barrier is to update a 'card table' in a generational garbage
|
|
60 collector.
|
|
61
|
|
62 #. A safepoint is a location at which pointers visible to the compiled
|
|
63 code (i.e. currently in registers or on the stack) are allowed to
|
|
64 change. After the safepoint completes, the actual pointer value
|
|
65 may differ, but the 'object' (as seen by the source language)
|
|
66 pointed to will not.
|
|
67
|
|
68 Note that the term 'safepoint' is somewhat overloaded. It refers to
|
|
69 both the location at which the machine state is parsable and the
|
|
70 coordination protocol involved in bring application threads to a
|
|
71 point at which the collector can safely use that information. The
|
|
72 term "statepoint" as used in this document refers exclusively to the
|
|
73 former.
|
|
74
|
|
75 This document focuses on the last item - compiler support for
|
|
76 safepoints in generated code. We will assume that an outside
|
|
77 mechanism has decided where to place safepoints. From our
|
|
78 perspective, all safepoints will be function calls. To support
|
|
79 relocation of objects directly reachable from values in compiled code,
|
|
80 the collector must be able to:
|
|
81
|
|
82 #. identify every copy of a pointer (including copies introduced by
|
|
83 the compiler itself) at the safepoint,
|
|
84 #. identify which object each pointer relates to, and
|
|
85 #. potentially update each of those copies.
|
|
86
|
|
87 This document describes the mechanism by which an LLVM based compiler
|
|
88 can provide this information to a language runtime/collector, and
|
|
89 ensure that all pointers can be read and updated if desired. The
|
|
90 heart of the approach is to construct (or rewrite) the IR in a manner
|
|
91 where the possible updates performed by the garbage collector are
|
|
92 explicitly visible in the IR. Doing so requires that we:
|
|
93
|
|
94 #. create a new SSA value for each potentially relocated pointer, and
|
|
95 ensure that no uses of the original (non relocated) value is
|
|
96 reachable after the safepoint,
|
|
97 #. specify the relocation in a way which is opaque to the compiler to
|
|
98 ensure that the optimizer can not introduce new uses of an
|
|
99 unrelocated value after a statepoint. This prevents the optimizer
|
|
100 from performing unsound optimizations.
|
|
101 #. recording a mapping of live pointers (and the allocation they're
|
|
102 associated with) for each statepoint.
|
|
103
|
|
104 At the most abstract level, inserting a safepoint can be thought of as
|
|
105 replacing a call instruction with a call to a multiple return value
|
|
106 function which both calls the original target of the call, returns
|
|
107 it's result, and returns updated values for any live pointers to
|
|
108 garbage collected objects.
|
|
109
|
|
110 Note that the task of identifying all live pointers to garbage
|
|
111 collected values, transforming the IR to expose a pointer giving the
|
|
112 base object for every such live pointer, and inserting all the
|
|
113 intrinsics correctly is explicitly out of scope for this document.
|
|
114 The recommended approach is described in the section of Late
|
|
115 Safepoint Placement below.
|
|
116
|
|
117 This abstract function call is concretely represented by a sequence of
|
|
118 intrinsic calls known as a 'statepoint sequence'.
|
|
119
|
|
120
|
|
121 Let's consider a simple call in LLVM IR:
|
|
122 todo
|
|
123
|
|
124 Depending on our language we may need to allow a safepoint during the
|
|
125 execution of the function called from this site. If so, we need to
|
|
126 let the collector update local values in the current frame.
|
|
127
|
|
128 Let's say we need to relocate SSA values 'a', 'b', and 'c' at this
|
|
129 safepoint. To represent this, we would generate the statepoint
|
|
130 sequence:
|
|
131
|
|
132 todo
|
|
133
|
|
134 Ideally, this sequence would have been represented as a M argument, N
|
|
135 return value function (where M is the number of values being
|
|
136 relocated + the original call arguments and N is the original return
|
|
137 value + each relocated value), but LLVM does not easily support such a
|
|
138 representation.
|
|
139
|
|
140 Instead, the statepoint intrinsic marks the actual site of the
|
|
141 safepoint or statepoint. The statepoint returns a token value (which
|
|
142 exists only at compile time). To get back the original return value
|
|
143 of the call, we use the 'gc.result' intrinsic. To get the relocation
|
|
144 of each pointer in turn, we use the 'gc.relocate' intrinsic with the
|
|
145 appropriate index. Note that both the gc.relocate and gc.result are
|
|
146 tied to the statepoint. The combination forms a "statepoint sequence"
|
|
147 and represents the entitety of a parseable call or 'statepoint'.
|
|
148
|
|
149 When lowered, this example would generate the following x86 assembly::
|
|
150 put assembly here
|
|
151
|
|
152 Each of the potentially relocated values has been spilled to the
|
|
153 stack, and a record of that location has been recorded to the
|
|
154 :ref:`Stack Map section <stackmap-section>`. If the garbage collector
|
|
155 needs to update any of these pointers during the call, it knows
|
|
156 exactly what to change.
|
|
157
|
|
158 Intrinsics
|
|
159 ===========
|
|
160
|
|
161 '''gc.statepoint''' Intrinsic
|
|
162 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
163
|
|
164 Syntax:
|
|
165 """""""
|
|
166
|
|
167 ::
|
|
168
|
|
169 declare i32
|
|
170 @gc.statepoint(func_type <target>, i64 <#call args>.
|
|
171 i64 <unused>, ... (call parameters),
|
|
172 i64 <# deopt args>, ... (deopt parameters),
|
|
173 ... (gc parameters))
|
|
174
|
|
175 Overview:
|
|
176 """""""""
|
|
177
|
|
178 The statepoint intrinsic represents a call which is parse-able by the
|
|
179 runtime.
|
|
180
|
|
181 Operands:
|
|
182 """""""""
|
|
183
|
|
184 The 'target' operand is the function actually being called. The
|
|
185 target can be specified as either a symbolic LLVM function, or as an
|
|
186 arbitrary Value of appropriate function type. Note that the function
|
|
187 type must match the signature of the callee and the types of the 'call
|
|
188 parameters' arguments.
|
|
189
|
|
190 The '#call args' operand is the number of arguments to the actual
|
|
191 call. It must exactly match the number of arguments passed in the
|
|
192 'call parameters' variable length section.
|
|
193
|
|
194 The 'unused' operand is unused and likely to be removed. Please do
|
|
195 not use.
|
|
196
|
|
197 The 'call parameters' arguments are simply the arguments which need to
|
|
198 be passed to the call target. They will be lowered according to the
|
|
199 specified calling convention and otherwise handled like a normal call
|
|
200 instruction. The number of arguments must exactly match what is
|
|
201 specified in '# call args'. The types must match the signature of
|
|
202 'target'.
|
|
203
|
|
204 The 'deopt parameters' arguments contain an arbitrary list of Values
|
|
205 which is meaningful to the runtime. The runtime may read any of these
|
|
206 values, but is assumed not to modify them. If the garbage collector
|
|
207 might need to modify one of these values, it must also be listed in
|
|
208 the 'gc pointer' argument list. The '# deopt args' field indicates
|
|
209 how many operands are to be interpreted as 'deopt parameters'.
|
|
210
|
|
211 The 'gc parameters' arguments contain every pointer to a garbage
|
|
212 collector object which potentially needs to be updated by the garbage
|
|
213 collector. Note that the argument list must explicitly contain a base
|
|
214 pointer for every derived pointer listed. The order of arguments is
|
|
215 unimportant. Unlike the other variable length parameter sets, this
|
|
216 list is not length prefixed.
|
|
217
|
|
218 Semantics:
|
|
219 """"""""""
|
|
220
|
|
221 A statepoint is assumed to read and write all memory. As a result,
|
|
222 memory operations can not be reordered past a statepoint. It is
|
|
223 illegal to mark a statepoint as being either 'readonly' or 'readnone'.
|
|
224
|
|
225 Note that legal IR can not perform any memory operation on a 'gc
|
|
226 pointer' argument of the statepoint in a location statically reachable
|
|
227 from the statepoint. Instead, the explicitly relocated value (from a
|
|
228 ''gc.relocate'') must be used.
|
|
229
|
|
230 '''gc.result''' Intrinsic
|
|
231 ^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
232
|
|
233 Syntax:
|
|
234 """""""
|
|
235
|
|
236 ::
|
|
237
|
|
238 declare type*
|
|
239 @gc.result(i32 %statepoint_token)
|
|
240
|
|
241 Overview:
|
|
242 """""""""
|
|
243
|
|
244 '''gc.result''' extracts the result of the original call instruction
|
|
245 which was replaced by the '''gc.statepoint'''. The '''gc.result'''
|
|
246 intrinsic is actually a family of three intrinsics due to an
|
|
247 implementation limitation. Other than the type of the return value,
|
|
248 the semantics are the same.
|
|
249
|
|
250 Operands:
|
|
251 """""""""
|
|
252
|
|
253 The first and only argument is the '''gc.statepoint''' which starts
|
|
254 the safepoint sequence of which this '''gc.result'' is a part.
|
|
255 Despite the typing of this as a generic i32, *only* the value defined
|
|
256 by a '''gc.statepoint''' is legal here.
|
|
257
|
|
258 Semantics:
|
|
259 """"""""""
|
|
260
|
|
261 The ''gc.result'' represents the return value of the call target of
|
|
262 the ''statepoint''. The type of the ''gc.result'' must exactly match
|
|
263 the type of the target. If the call target returns void, there will
|
|
264 be no ''gc.result''.
|
|
265
|
|
266 A ''gc.result'' is modeled as a 'readnone' pure function. It has no
|
|
267 side effects since it is just a projection of the return value of the
|
|
268 previous call represented by the ''gc.statepoint''.
|
|
269
|
|
270 '''gc.relocate''' Intrinsic
|
|
271 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
272
|
|
273 Syntax:
|
|
274 """""""
|
|
275
|
|
276 ::
|
|
277
|
|
278 declare <type> addrspace(1)*
|
|
279 @gc.relocate(i32 %statepoint_token, i32 %base_offset, i32 %pointer_offset)
|
|
280
|
|
281 Overview:
|
|
282 """""""""
|
|
283
|
|
284 A ''gc.relocate'' returns the potentially relocated value of a pointer
|
|
285 at the safepoint.
|
|
286
|
|
287 Operands:
|
|
288 """""""""
|
|
289
|
|
290 The first argument is the '''gc.statepoint''' which starts the
|
|
291 safepoint sequence of which this '''gc.relocation'' is a part.
|
|
292 Despite the typing of this as a generic i32, *only* the value defined
|
|
293 by a '''gc.statepoint''' is legal here.
|
|
294
|
|
295 The second argument is an index into the statepoints list of arguments
|
|
296 which specifies the base pointer for the pointer being relocated.
|
|
297 This index must land within the 'gc parameter' section of the
|
|
298 statepoint's argument list.
|
|
299
|
|
300 The third argument is an index into the statepoint's list of arguments
|
|
301 which specify the (potentially) derived pointer being relocated. It
|
|
302 is legal for this index to be the same as the second argument
|
|
303 if-and-only-if a base pointer is being relocated. This index must land
|
|
304 within the 'gc parameter' section of the statepoint's argument list.
|
|
305
|
|
306 Semantics:
|
|
307 """"""""""
|
|
308
|
|
309 The return value of ''gc.relocate'' is the potentially relocated value
|
|
310 of the pointer specified by it's arguments. It is unspecified how the
|
|
311 value of the returned pointer relates to the argument to the
|
|
312 ''gc.statepoint'' other than that a) it points to the same source
|
|
313 language object with the same offset, and b) the 'based-on'
|
|
314 relationship of the newly relocated pointers is a projection of the
|
|
315 unrelocated pointers. In particular, the integer value of the pointer
|
|
316 returned is unspecified.
|
|
317
|
|
318 A ''gc.relocate'' is modeled as a 'readnone' pure function. It has no
|
|
319 side effects since it is just a way to extract information about work
|
|
320 done during the actual call modeled by the ''gc.statepoint''.
|
|
321
|
|
322
|
|
323 Stack Map Format
|
|
324 ================
|
|
325
|
|
326 Locations for each pointer value which may need read and/or updated by
|
|
327 the runtime or collector are provided via the :ref:`Stack Map format
|
|
328 <stackmap-format>` specified in the PatchPoint documentation.
|
|
329
|
|
330 Each statepoint generates the following Locations:
|
|
331
|
|
332 * Constant which describes number of following deopt *Locations* (not
|
|
333 operands)
|
|
334 * Variable number of Locations, one for each deopt parameter listed in
|
|
335 the IR statepoint (same number as described by previous Constant)
|
|
336 * Variable number of Locations pairs, one pair for each unique pointer
|
|
337 which needs relocated. The first Location in each pair describes
|
|
338 the base pointer for the object. The second is the derived pointer
|
|
339 actually being relocated. It is guaranteed that the base pointer
|
|
340 must also appear explicitly as a relocation pair if used after the
|
|
341 statepoint. There may be fewer pairs then gc parameters in the IR
|
|
342 statepoint. Each *unique* pair will occur at least once; duplicates
|
|
343 are possible.
|
|
344
|
|
345 Note that the Locations used in each section may describe the same
|
|
346 physical location. e.g. A stack slot may appear as a deopt location,
|
|
347 a gc base pointer, and a gc derived pointer.
|
|
348
|
|
349 The ID field of the 'StkMapRecord' for a statepoint is meaningless and
|
|
350 it's value is explicitly unspecified.
|
|
351
|
|
352 The LiveOut section of the StkMapRecord will be empty for a statepoint
|
|
353 record.
|
|
354
|
|
355 Safepoint Semantics & Verification
|
|
356 ==================================
|
|
357
|
|
358 The fundamental correctness property for the compiled code's
|
|
359 correctness w.r.t. the garbage collector is a dynamic one. It must be
|
|
360 the case that there is no dynamic trace such that a operation
|
|
361 involving a potentially relocated pointer is observably-after a
|
|
362 safepoint which could relocate it. 'observably-after' is this usage
|
|
363 means that an outside observer could observe this sequence of events
|
|
364 in a way which precludes the operation being performed before the
|
|
365 safepoint.
|
|
366
|
|
367 To understand why this 'observable-after' property is required,
|
|
368 consider a null comparison performed on the original copy of a
|
|
369 relocated pointer. Assuming that control flow follows the safepoint,
|
|
370 there is no way to observe externally whether the null comparison is
|
|
371 performed before or after the safepoint. (Remember, the original
|
|
372 Value is unmodified by the safepoint.) The compiler is free to make
|
|
373 either scheduling choice.
|
|
374
|
|
375 The actual correctness property implemented is slightly stronger than
|
|
376 this. We require that there be no *static path* on which a
|
|
377 potentially relocated pointer is 'observably-after' it may have been
|
|
378 relocated. This is slightly stronger than is strictly necessary (and
|
|
379 thus may disallow some otherwise valid programs), but greatly
|
|
380 simplifies reasoning about correctness of the compiled code.
|
|
381
|
|
382 By construction, this property will be upheld by the optimizer if
|
|
383 correctly established in the source IR. This is a key invariant of
|
|
384 the design.
|
|
385
|
|
386 The existing IR Verifier pass has been extended to check most of the
|
|
387 local restrictions on the intrinsics mentioned in their respective
|
|
388 documentation. The current implementation in LLVM does not check the
|
|
389 key relocation invariant, but this is ongoing work on developing such
|
|
390 a verifier. Please ask on llvmdev if you're interested in
|
|
391 experimenting with the current version.
|
|
392
|
|
393 Bugs and Enhancements
|
|
394 =====================
|
|
395
|
|
396 Currently known bugs and enhancements under consideration can be
|
|
397 tracked by performing a `bugzilla search
|
|
398 <http://llvm.org/bugs/buglist.cgi?cmdtype=runnamed&namedcmd=Statepoint%20Bugs&list_id=64342>`_
|
|
399 for [Statepoint] in the summary field. When filing new bugs, please
|
|
400 use this tag so that interested parties see the newly filed bug. As
|
|
401 with most LLVM features, design discussions take place on `llvmdev
|
|
402 <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_, and patches
|
|
403 should be sent to `llvm-commits
|
|
404 <http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits>`_ for review.
|
|
405
|