120
|
1 =============
|
|
2 Type Metadata
|
|
3 =============
|
|
4
|
|
5 Type metadata is a mechanism that allows IR modules to co-operatively build
|
|
6 pointer sets corresponding to addresses within a given set of globals. LLVM's
|
|
7 `control flow integrity`_ implementation uses this metadata to efficiently
|
|
8 check (at each call site) that a given address corresponds to either a
|
|
9 valid vtable or function pointer for a given class or function type, and its
|
|
10 whole-program devirtualization pass uses the metadata to identify potential
|
|
11 callees for a given virtual call.
|
|
12
|
|
13 To use the mechanism, a client creates metadata nodes with two elements:
|
|
14
|
|
15 1. a byte offset into the global (generally zero for functions)
|
|
16 2. a metadata object representing an identifier for the type
|
|
17
|
|
18 These metadata nodes are associated with globals by using global object
|
|
19 metadata attachments with the ``!type`` metadata kind.
|
|
20
|
|
21 Each type identifier must exclusively identify either global variables
|
|
22 or functions.
|
|
23
|
|
24 .. admonition:: Limitation
|
|
25
|
|
26 The current implementation only supports attaching metadata to functions on
|
|
27 the x86-32 and x86-64 architectures.
|
|
28
|
|
29 An intrinsic, :ref:`llvm.type.test <type.test>`, is used to test whether a
|
|
30 given pointer is associated with a type identifier.
|
|
31
|
|
32 .. _control flow integrity: http://clang.llvm.org/docs/ControlFlowIntegrity.html
|
|
33
|
|
34 Representing Type Information using Type Metadata
|
|
35 =================================================
|
|
36
|
|
37 This section describes how Clang represents C++ type information associated with
|
|
38 virtual tables using type metadata.
|
|
39
|
|
40 Consider the following inheritance hierarchy:
|
|
41
|
|
42 .. code-block:: c++
|
|
43
|
|
44 struct A {
|
|
45 virtual void f();
|
|
46 };
|
|
47
|
|
48 struct B : A {
|
|
49 virtual void f();
|
|
50 virtual void g();
|
|
51 };
|
|
52
|
|
53 struct C {
|
|
54 virtual void h();
|
|
55 };
|
|
56
|
|
57 struct D : A, C {
|
|
58 virtual void f();
|
|
59 virtual void h();
|
|
60 };
|
|
61
|
|
62 The virtual table objects for A, B, C and D look like this (under the Itanium ABI):
|
|
63
|
|
64 .. csv-table:: Virtual Table Layout for A, B, C, D
|
|
65 :header: Class, 0, 1, 2, 3, 4, 5, 6
|
|
66
|
|
67 A, A::offset-to-top, &A::rtti, &A::f
|
|
68 B, B::offset-to-top, &B::rtti, &B::f, &B::g
|
|
69 C, C::offset-to-top, &C::rtti, &C::h
|
|
70 D, D::offset-to-top, &D::rtti, &D::f, &D::h, D::offset-to-top, &D::rtti, thunk for &D::h
|
|
71
|
|
72 When an object of type A is constructed, the address of ``&A::f`` in A's
|
|
73 virtual table object is stored in the object's vtable pointer. In ABI parlance
|
|
74 this address is known as an `address point`_. Similarly, when an object of type
|
|
75 B is constructed, the address of ``&B::f`` is stored in the vtable pointer. In
|
|
76 this way, the vtable in B's virtual table object is compatible with A's vtable.
|
|
77
|
|
78 D is a little more complicated, due to the use of multiple inheritance. Its
|
|
79 virtual table object contains two vtables, one compatible with A's vtable and
|
|
80 the other compatible with C's vtable. Objects of type D contain two virtual
|
|
81 pointers, one belonging to the A subobject and containing the address of
|
|
82 the vtable compatible with A's vtable, and the other belonging to the C
|
|
83 subobject and containing the address of the vtable compatible with C's vtable.
|
|
84
|
|
85 The full set of compatibility information for the above class hierarchy is
|
|
86 shown below. The following table shows the name of a class, the offset of an
|
|
87 address point within that class's vtable and the name of one of the classes
|
|
88 with which that address point is compatible.
|
|
89
|
|
90 .. csv-table:: Type Offsets for A, B, C, D
|
|
91 :header: VTable for, Offset, Compatible Class
|
|
92
|
|
93 A, 16, A
|
|
94 B, 16, A
|
|
95 , , B
|
|
96 C, 16, C
|
|
97 D, 16, A
|
|
98 , , D
|
|
99 , 48, C
|
|
100
|
|
101 The next step is to encode this compatibility information into the IR. The way
|
|
102 this is done is to create type metadata named after each of the compatible
|
|
103 classes, with which we associate each of the compatible address points in
|
|
104 each vtable. For example, these type metadata entries encode the compatibility
|
|
105 information for the above hierarchy:
|
|
106
|
|
107 ::
|
|
108
|
|
109 @_ZTV1A = constant [...], !type !0
|
|
110 @_ZTV1B = constant [...], !type !0, !type !1
|
|
111 @_ZTV1C = constant [...], !type !2
|
|
112 @_ZTV1D = constant [...], !type !0, !type !3, !type !4
|
|
113
|
|
114 !0 = !{i64 16, !"_ZTS1A"}
|
|
115 !1 = !{i64 16, !"_ZTS1B"}
|
|
116 !2 = !{i64 16, !"_ZTS1C"}
|
|
117 !3 = !{i64 16, !"_ZTS1D"}
|
|
118 !4 = !{i64 48, !"_ZTS1C"}
|
|
119
|
|
120 With this type metadata, we can now use the ``llvm.type.test`` intrinsic to
|
|
121 test whether a given pointer is compatible with a type identifier. Working
|
|
122 backwards, if ``llvm.type.test`` returns true for a particular pointer,
|
|
123 we can also statically determine the identities of the virtual functions
|
|
124 that a particular virtual call may call. For example, if a program assumes
|
|
125 a pointer to be a member of ``!"_ZST1A"``, we know that the address can
|
|
126 be only be one of ``_ZTV1A+16``, ``_ZTV1B+16`` or ``_ZTV1D+16`` (i.e. the
|
|
127 address points of the vtables of A, B and D respectively). If we then load
|
|
128 an address from that pointer, we know that the address can only be one of
|
|
129 ``&A::f``, ``&B::f`` or ``&D::f``.
|
|
130
|
|
131 .. _address point: https://mentorembedded.github.io/cxx-abi/abi.html#vtable-general
|
|
132
|
|
133 Testing Addresses For Type Membership
|
|
134 =====================================
|
|
135
|
|
136 If a program tests an address using ``llvm.type.test``, this will cause
|
|
137 a link-time optimization pass, ``LowerTypeTests``, to replace calls to this
|
|
138 intrinsic with efficient code to perform type member tests. At a high level,
|
|
139 the pass will lay out referenced globals in a consecutive memory region in
|
|
140 the object file, construct bit vectors that map onto that memory region,
|
|
141 and generate code at each of the ``llvm.type.test`` call sites to test
|
|
142 pointers against those bit vectors. Because of the layout manipulation, the
|
|
143 globals' definitions must be available at LTO time. For more information,
|
|
144 see the `control flow integrity design document`_.
|
|
145
|
|
146 A type identifier that identifies functions is transformed into a jump table,
|
|
147 which is a block of code consisting of one branch instruction for each
|
|
148 of the functions associated with the type identifier that branches to the
|
|
149 target function. The pass will redirect any taken function addresses to the
|
|
150 corresponding jump table entry. In the object file's symbol table, the jump
|
|
151 table entries take the identities of the original functions, so that addresses
|
|
152 taken outside the module will pass any verification done inside the module.
|
|
153
|
|
154 Jump tables may call external functions, so their definitions need not
|
|
155 be available at LTO time. Note that if an externally defined function is
|
|
156 associated with a type identifier, there is no guarantee that its identity
|
|
157 within the module will be the same as its identity outside of the module,
|
|
158 as the former will be the jump table entry if a jump table is necessary.
|
|
159
|
|
160 The `GlobalLayoutBuilder`_ class is responsible for laying out the globals
|
|
161 efficiently to minimize the sizes of the underlying bitsets.
|
|
162
|
|
163 .. _control flow integrity design document: http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html
|
|
164
|
|
165 :Example:
|
|
166
|
|
167 ::
|
|
168
|
|
169 target datalayout = "e-p:32:32"
|
|
170
|
|
171 @a = internal global i32 0, !type !0
|
|
172 @b = internal global i32 0, !type !0, !type !1
|
|
173 @c = internal global i32 0, !type !1
|
|
174 @d = internal global [2 x i32] [i32 0, i32 0], !type !2
|
|
175
|
|
176 define void @e() !type !3 {
|
|
177 ret void
|
|
178 }
|
|
179
|
|
180 define void @f() {
|
|
181 ret void
|
|
182 }
|
|
183
|
|
184 declare void @g() !type !3
|
|
185
|
|
186 !0 = !{i32 0, !"typeid1"}
|
|
187 !1 = !{i32 0, !"typeid2"}
|
|
188 !2 = !{i32 4, !"typeid2"}
|
|
189 !3 = !{i32 0, !"typeid3"}
|
|
190
|
|
191 declare i1 @llvm.type.test(i8* %ptr, metadata %typeid) nounwind readnone
|
|
192
|
|
193 define i1 @foo(i32* %p) {
|
|
194 %pi8 = bitcast i32* %p to i8*
|
|
195 %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid1")
|
|
196 ret i1 %x
|
|
197 }
|
|
198
|
|
199 define i1 @bar(i32* %p) {
|
|
200 %pi8 = bitcast i32* %p to i8*
|
|
201 %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid2")
|
|
202 ret i1 %x
|
|
203 }
|
|
204
|
|
205 define i1 @baz(void ()* %p) {
|
|
206 %pi8 = bitcast void ()* %p to i8*
|
|
207 %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid3")
|
|
208 ret i1 %x
|
|
209 }
|
|
210
|
|
211 define void @main() {
|
|
212 %a1 = call i1 @foo(i32* @a) ; returns 1
|
|
213 %b1 = call i1 @foo(i32* @b) ; returns 1
|
|
214 %c1 = call i1 @foo(i32* @c) ; returns 0
|
|
215 %a2 = call i1 @bar(i32* @a) ; returns 0
|
|
216 %b2 = call i1 @bar(i32* @b) ; returns 1
|
|
217 %c2 = call i1 @bar(i32* @c) ; returns 1
|
|
218 %d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0
|
|
219 %d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1
|
|
220 %e = call i1 @baz(void ()* @e) ; returns 1
|
|
221 %f = call i1 @baz(void ()* @f) ; returns 0
|
|
222 %g = call i1 @baz(void ()* @g) ; returns 1
|
|
223 ret void
|
|
224 }
|
|
225
|
|
226 .. _GlobalLayoutBuilder: http://llvm.org/klaus/llvm/blob/master/include/llvm/Transforms/IPO/LowerTypeTests.h
|