150
|
1 =================
|
|
2 DataFlowSanitizer
|
|
3 =================
|
|
4
|
|
5 .. toctree::
|
|
6 :hidden:
|
|
7
|
|
8 DataFlowSanitizerDesign
|
|
9
|
|
10 .. contents::
|
|
11 :local:
|
|
12
|
|
13 Introduction
|
|
14 ============
|
|
15
|
|
16 DataFlowSanitizer is a generalised dynamic data flow analysis.
|
|
17
|
|
18 Unlike other Sanitizer tools, this tool is not designed to detect a
|
|
19 specific class of bugs on its own. Instead, it provides a generic
|
|
20 dynamic data flow analysis framework to be used by clients to help
|
|
21 detect application-specific issues within their own code.
|
|
22
|
173
|
23 How to build libc++ with DFSan
|
|
24 ==============================
|
|
25
|
|
26 DFSan requires either all of your code to be instrumented or for uninstrumented
|
|
27 functions to be listed as ``uninstrumented`` in the `ABI list`_.
|
|
28
|
|
29 If you'd like to have instrumented libc++ functions, then you need to build it
|
|
30 with DFSan instrumentation from source. Here is an example of how to build
|
|
31 libc++ and the libc++ ABI with data flow sanitizer instrumentation.
|
|
32
|
|
33 .. code-block:: console
|
|
34
|
236
|
35 mkdir libcxx-build
|
173
|
36 cd libcxx-build
|
|
37
|
|
38 # An example using ninja
|
236
|
39 cmake -GNinja -S <monorepo-root>/runtimes \
|
173
|
40 -DCMAKE_C_COMPILER=clang \
|
|
41 -DCMAKE_CXX_COMPILER=clang++ \
|
|
42 -DLLVM_USE_SANITIZER="DataFlow" \
|
236
|
43 -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi"
|
173
|
44
|
|
45 ninja cxx cxxabi
|
|
46
|
|
47 Note: Ensure you are building with a sufficiently new version of Clang.
|
|
48
|
150
|
49 Usage
|
|
50 =====
|
|
51
|
|
52 With no program changes, applying DataFlowSanitizer to a program
|
|
53 will not alter its behavior. To use DataFlowSanitizer, the program
|
|
54 uses API functions to apply tags to data to cause it to be tracked, and to
|
|
55 check the tag of a specific data item. DataFlowSanitizer manages
|
|
56 the propagation of tags through the program according to its data flow.
|
|
57
|
|
58 The APIs are defined in the header file ``sanitizer/dfsan_interface.h``.
|
|
59 For further information about each function, please refer to the header
|
|
60 file.
|
|
61
|
173
|
62 .. _ABI list:
|
|
63
|
150
|
64 ABI List
|
|
65 --------
|
|
66
|
|
67 DataFlowSanitizer uses a list of functions known as an ABI list to decide
|
|
68 whether a call to a specific function should use the operating system's native
|
|
69 ABI or whether it should use a variant of this ABI that also propagates labels
|
|
70 through function parameters and return values. The ABI list file also controls
|
|
71 how labels are propagated in the former case. DataFlowSanitizer comes with a
|
|
72 default ABI list which is intended to eventually cover the glibc library on
|
|
73 Linux but it may become necessary for users to extend the ABI list in cases
|
|
74 where a particular library or function cannot be instrumented (e.g. because
|
|
75 it is implemented in assembly or another language which DataFlowSanitizer does
|
|
76 not support) or a function is called from a library or function which cannot
|
|
77 be instrumented.
|
|
78
|
|
79 DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`.
|
|
80 The pass treats every function in the ``uninstrumented`` category in the
|
|
81 ABI list file as conforming to the native ABI. Unless the ABI list contains
|
|
82 additional categories for those functions, a call to one of those functions
|
|
83 will produce a warning message, as the labelling behavior of the function
|
|
84 is unknown. The other supported categories are ``discard``, ``functional``
|
|
85 and ``custom``.
|
|
86
|
|
87 * ``discard`` -- To the extent that this function writes to (user-accessible)
|
|
88 memory, it also updates labels in shadow memory (this condition is trivially
|
|
89 satisfied for functions which do not write to user-accessible memory). Its
|
|
90 return value is unlabelled.
|
|
91 * ``functional`` -- Like ``discard``, except that the label of its return value
|
|
92 is the union of the label of its arguments.
|
|
93 * ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F``
|
|
94 is called, where ``F`` is the name of the function. This function may wrap
|
|
95 the original function or provide its own implementation. This category is
|
|
96 generally used for uninstrumentable functions which write to user-accessible
|
|
97 memory or which have more complex label propagation behavior. The signature
|
|
98 of ``__dfsw_F`` is based on that of ``F`` with each argument having a
|
|
99 label of type ``dfsan_label`` appended to the argument list. If ``F``
|
|
100 is of non-void return type a final argument of type ``dfsan_label *``
|
|
101 is appended to which the custom function can store the label for the
|
|
102 return value. For example:
|
|
103
|
|
104 .. code-block:: c++
|
|
105
|
|
106 void f(int x);
|
|
107 void __dfsw_f(int x, dfsan_label x_label);
|
|
108
|
|
109 void *memcpy(void *dest, const void *src, size_t n);
|
|
110 void *__dfsw_memcpy(void *dest, const void *src, size_t n,
|
|
111 dfsan_label dest_label, dfsan_label src_label,
|
|
112 dfsan_label n_label, dfsan_label *ret_label);
|
|
113
|
|
114 If a function defined in the translation unit being compiled belongs to the
|
|
115 ``uninstrumented`` category, it will be compiled so as to conform to the
|
|
116 native ABI. Its arguments will be assumed to be unlabelled, but it will
|
|
117 propagate labels in shadow memory.
|
|
118
|
|
119 For example:
|
|
120
|
|
121 .. code-block:: none
|
|
122
|
|
123 # main is called by the C runtime using the native ABI.
|
|
124 fun:main=uninstrumented
|
|
125 fun:main=discard
|
|
126
|
|
127 # malloc only writes to its internal data structures, not user-accessible memory.
|
|
128 fun:malloc=uninstrumented
|
|
129 fun:malloc=discard
|
|
130
|
|
131 # tolower is a pure function.
|
|
132 fun:tolower=uninstrumented
|
|
133 fun:tolower=functional
|
|
134
|
|
135 # memcpy needs to copy the shadow from the source to the destination region.
|
|
136 # This is done in a custom function.
|
|
137 fun:memcpy=uninstrumented
|
|
138 fun:memcpy=custom
|
|
139
|
236
|
140 For instrumented functions, the ABI list supports a ``force_zero_labels``
|
|
141 category, which will make all stores and return values set zero labels.
|
|
142 Functions should never be labelled with both ``force_zero_labels``
|
252
|
143 and ``uninstrumented`` or any of the uninstrumented wrapper kinds.
|
236
|
144
|
|
145 For example:
|
|
146
|
|
147 .. code-block:: none
|
|
148
|
|
149 # e.g. void writes_data(char* out_buf, int out_buf_len) {...}
|
|
150 # Applying force_zero_labels will force out_buf shadow to zero.
|
|
151 fun:writes_data=force_zero_labels
|
|
152
|
|
153
|
|
154 Compilation Flags
|
|
155 -----------------
|
|
156
|
|
157 * ``-dfsan-abilist`` -- The additional ABI list files that control how shadow
|
|
158 parameters are passed. File names are separated by comma.
|
|
159 * ``-dfsan-combine-pointer-labels-on-load`` -- Controls whether to include or
|
|
160 ignore the labels of pointers in load instructions. Its default value is true.
|
|
161 For example:
|
|
162
|
|
163 .. code-block:: c++
|
|
164
|
|
165 v = *p;
|
|
166
|
|
167 If the flag is true, the label of ``v`` is the union of the label of ``p`` and
|
|
168 the label of ``*p``. If the flag is false, the label of ``v`` is the label of
|
|
169 just ``*p``.
|
|
170
|
|
171 * ``-dfsan-combine-pointer-labels-on-store`` -- Controls whether to include or
|
|
172 ignore the labels of pointers in store instructions. Its default value is
|
|
173 false. For example:
|
|
174
|
|
175 .. code-block:: c++
|
|
176
|
|
177 *p = v;
|
|
178
|
|
179 If the flag is true, the label of ``*p`` is the union of the label of ``p`` and
|
|
180 the label of ``v``. If the flag is false, the label of ``*p`` is the label of
|
|
181 just ``v``.
|
|
182
|
|
183 * ``-dfsan-combine-offset-labels-on-gep`` -- Controls whether to propagate
|
|
184 labels of offsets in GEP instructions. Its default value is true. For example:
|
|
185
|
|
186 .. code-block:: c++
|
|
187
|
|
188 p += i;
|
|
189
|
|
190 If the flag is true, the label of ``p`` is the union of the label of ``p`` and
|
|
191 the label of ``i``. If the flag is false, the label of ``p`` is unchanged.
|
|
192
|
|
193 * ``-dfsan-track-select-control-flow`` -- Controls whether to track the control
|
|
194 flow of select instructions. Its default value is true. For example:
|
|
195
|
|
196 .. code-block:: c++
|
|
197
|
|
198 v = b? v1: v2;
|
|
199
|
|
200 If the flag is true, the label of ``v`` is the union of the labels of ``b``,
|
|
201 ``v1`` and ``v2``. If the flag is false, the label of ``v`` is the union of the
|
|
202 labels of just ``v1`` and ``v2``.
|
|
203
|
|
204 * ``-dfsan-event-callbacks`` -- An experimental feature that inserts callbacks for
|
|
205 certain data events. Currently callbacks are only inserted for loads, stores,
|
|
206 memory transfers (i.e. memcpy and memmove), and comparisons. Its default value
|
|
207 is false. If this flag is set to true, a user must provide definitions for the
|
|
208 following callback functions:
|
|
209
|
|
210 .. code-block:: c++
|
|
211
|
|
212 void __dfsan_load_callback(dfsan_label Label, void* Addr);
|
|
213 void __dfsan_store_callback(dfsan_label Label, void* Addr);
|
|
214 void __dfsan_mem_transfer_callback(dfsan_label *Start, size_t Len);
|
|
215 void __dfsan_cmp_callback(dfsan_label CombinedLabel);
|
|
216
|
|
217 * ``-dfsan-conditional-callbacks`` -- An experimental feature that inserts
|
|
218 callbacks for control flow conditional expressions.
|
|
219 This can be used to find where tainted values can control execution.
|
|
220
|
|
221 In addition to this compilation flag, a callback handler must be registered
|
|
222 using ``dfsan_set_conditional_callback(my_callback);``, where my_callback is
|
|
223 a function with a signature matching
|
|
224 ``void my_callback(dfsan_label l, dfsan_origin o);``.
|
|
225 This signature is the same when origin tracking is disabled - in this case
|
|
226 the dfsan_origin passed in it will always be 0.
|
|
227
|
|
228 The callback will only be called when a tainted value reaches a conditional
|
|
229 expression for control flow (such as an if's condition).
|
|
230 The callback will be skipped for conditional expressions inside signal
|
|
231 handlers, as this is prone to deadlock. Tainted values used in conditional
|
|
232 expressions inside signal handlers will instead be aggregated via bitwise
|
|
233 or, and can be accessed using
|
|
234 ``dfsan_label dfsan_get_labels_in_signal_conditional();``.
|
|
235
|
|
236 * ``-dfsan-track-origins`` -- Controls how to track origins. When its value is
|
|
237 0, the runtime does not track origins. When its value is 1, the runtime tracks
|
|
238 origins at memory store operations. When its value is 2, the runtime tracks
|
|
239 origins at memory load and store operations. Its default value is 0.
|
|
240
|
|
241 * ``-dfsan-instrument-with-call-threshold`` -- If a function being instrumented
|
|
242 requires more than this number of origin stores, use callbacks instead of
|
|
243 inline checks (-1 means never use callbacks). Its default value is 3500.
|
|
244
|
|
245 Environment Variables
|
|
246 ---------------------
|
|
247
|
|
248 * ``warn_unimplemented`` -- Whether to warn on unimplemented functions. Its
|
|
249 default value is false.
|
|
250 * ``strict_data_dependencies`` -- Whether to propagate labels only when there is
|
|
251 explicit obvious data dependency (e.g., when comparing strings, ignore the fact
|
|
252 that the output of the comparison might be implicit data-dependent on the
|
|
253 content of the strings). This applies only to functions with ``custom`` category
|
|
254 in ABI list. Its default value is true.
|
|
255 * ``origin_history_size`` -- The limit of origin chain length. Non-positive values
|
|
256 mean unlimited. Its default value is 16.
|
|
257 * ``origin_history_per_stack_limit`` -- The limit of origin node's references count.
|
|
258 Non-positive values mean unlimited. Its default value is 20000.
|
|
259 * ``store_context_size`` -- The depth limit of origin tracking stack traces. Its
|
|
260 default value is 20.
|
|
261 * ``zero_in_malloc`` -- Whether to zero shadow space of new allocated memory. Its
|
|
262 default value is true.
|
|
263 * ``zero_in_free`` --- Whether to zero shadow space of deallocated memory. Its
|
|
264 default value is true.
|
|
265
|
150
|
266 Example
|
|
267 =======
|
|
268
|
223
|
269 DataFlowSanitizer supports up to 8 labels, to achieve low CPU and code
|
|
270 size overhead. Base labels are simply 8-bit unsigned integers that are
|
|
271 powers of 2 (i.e. 1, 2, 4, 8, ..., 128), and union labels are created
|
|
272 by ORing base labels.
|
|
273
|
150
|
274 The following program demonstrates label propagation by checking that
|
|
275 the correct labels are propagated.
|
|
276
|
|
277 .. code-block:: c++
|
|
278
|
|
279 #include <sanitizer/dfsan_interface.h>
|
|
280 #include <assert.h>
|
|
281
|
|
282 int main(void) {
|
221
|
283 int i = 100;
|
|
284 int j = 200;
|
|
285 int k = 300;
|
|
286 dfsan_label i_label = 1;
|
|
287 dfsan_label j_label = 2;
|
|
288 dfsan_label k_label = 4;
|
|
289 dfsan_set_label(i_label, &i, sizeof(i));
|
|
290 dfsan_set_label(j_label, &j, sizeof(j));
|
|
291 dfsan_set_label(k_label, &k, sizeof(k));
|
|
292
|
|
293 dfsan_label ij_label = dfsan_get_label(i + j);
|
|
294
|
|
295 assert(ij_label & i_label); // ij_label has i_label
|
|
296 assert(ij_label & j_label); // ij_label has j_label
|
|
297 assert(!(ij_label & k_label)); // ij_label doesn't have k_label
|
|
298 assert(ij_label == 3); // Verifies all of the above
|
|
299
|
223
|
300 // Or, equivalently:
|
|
301 assert(dfsan_has_label(ij_label, i_label));
|
|
302 assert(dfsan_has_label(ij_label, j_label));
|
|
303 assert(!dfsan_has_label(ij_label, k_label));
|
|
304
|
221
|
305 dfsan_label ijk_label = dfsan_get_label(i + j + k);
|
|
306
|
|
307 assert(ijk_label & i_label); // ijk_label has i_label
|
|
308 assert(ijk_label & j_label); // ijk_label has j_label
|
|
309 assert(ijk_label & k_label); // ijk_label has k_label
|
|
310 assert(ijk_label == 7); // Verifies all of the above
|
|
311
|
223
|
312 // Or, equivalently:
|
|
313 assert(dfsan_has_label(ijk_label, i_label));
|
|
314 assert(dfsan_has_label(ijk_label, j_label));
|
|
315 assert(dfsan_has_label(ijk_label, k_label));
|
|
316
|
221
|
317 return 0;
|
|
318 }
|
|
319
|
223
|
320 Origin Tracking
|
|
321 ===============
|
|
322
|
|
323 DataFlowSanitizer can track origins of labeled values. This feature is enabled by
|
|
324 ``-mllvm -dfsan-track-origins=1``. For example,
|
|
325
|
|
326 .. code-block:: console
|
|
327
|
|
328 % cat test.cc
|
|
329 #include <sanitizer/dfsan_interface.h>
|
|
330 #include <stdio.h>
|
|
331
|
|
332 int main(int argc, char** argv) {
|
|
333 int i = 0;
|
|
334 dfsan_set_label(i_label, &i, sizeof(i));
|
|
335 int j = i + 1;
|
|
336 dfsan_print_origin_trace(&j, "A flow from i to j");
|
|
337 return 0;
|
|
338 }
|
|
339
|
|
340 % clang++ -fsanitize=dataflow -mllvm -dfsan-track-origins=1 -fno-omit-frame-pointer -g -O2 test.cc
|
|
341 % ./a.out
|
|
342 Taint value 0x1 (at 0x7ffd42bf415c) origin tracking (A flow from i to j)
|
|
343 Origin value: 0x13900001, Taint value was stored to memory at
|
|
344 #0 0x55676db85a62 in main test.cc:7:7
|
|
345 #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
|
|
346
|
|
347 Origin value: 0x9e00001, Taint value was created at
|
|
348 #0 0x55676db85a08 in main test.cc:6:3
|
|
349 #1 0x7f0083611bbc in __libc_start_main libc-start.c:285
|
|
350
|
|
351 By ``-mllvm -dfsan-track-origins=1`` DataFlowSanitizer collects only
|
|
352 intermediate stores a labeled value went through. Origin tracking slows down
|
|
353 program execution by a factor of 2x on top of the usual DataFlowSanitizer
|
|
354 slowdown and increases memory overhead by 1x. By ``-mllvm -dfsan-track-origins=2``
|
|
355 DataFlowSanitizer also collects intermediate loads a labeled value went through.
|
|
356 This mode slows down program execution by a factor of 4x.
|
|
357
|
150
|
358 Current status
|
|
359 ==============
|
|
360
|
|
361 DataFlowSanitizer is a work in progress, currently under development for
|
|
362 x86\_64 Linux.
|
|
363
|
|
364 Design
|
|
365 ======
|
|
366
|
|
367 Please refer to the :doc:`design document<DataFlowSanitizerDesign>`.
|