Mercurial > hg > CbC > CbC_llvm
comparison docs/XRayFDRFormat.rst @ 121:803732b1fca8
LLVM 5.0
author | kono |
---|---|
date | Fri, 27 Oct 2017 17:07:41 +0900 |
parents | |
children | c2174574ed3a |
comparison
equal
deleted
inserted
replaced
120:1172e4bd9c6f | 121:803732b1fca8 |
---|---|
1 ====================================== | |
2 XRay Flight Data Recorder Trace Format | |
3 ====================================== | |
4 | |
5 :Version: 1 as of 2017-07-20 | |
6 | |
7 .. contents:: | |
8 :local: | |
9 | |
10 | |
11 Introduction | |
12 ============ | |
13 | |
14 When gathering XRay traces in Flight Data Recorder mode, each thread of an | |
15 application will claim buffers to fill with trace data, which at some point | |
16 is finalized and flushed. | |
17 | |
18 A goal of the profiler is to minimize overhead, so the flushed data directly | |
19 corresponds to the buffer. | |
20 | |
21 This document describes the format of a trace file. | |
22 | |
23 | |
24 General | |
25 ======= | |
26 | |
27 Each trace file corresponds to a sequence of events in a particular thread. | |
28 | |
29 The file has a header followed by a sequence of discriminated record types. | |
30 | |
31 The endianness of byte fields matches the endianess of the platform which | |
32 produced the trace file. | |
33 | |
34 | |
35 Header Section | |
36 ============== | |
37 | |
38 A trace file begins with a 32 byte header. | |
39 | |
40 +-------------------+-----------------+----------------------------------------+ | |
41 | Field | Size (bytes) | Description | | |
42 +===================+=================+========================================+ | |
43 | version | ``2`` | Anticipates versioned readers. This | | |
44 | | | document describes the format when | | |
45 | | | version == 1 | | |
46 +-------------------+-----------------+----------------------------------------+ | |
47 | type | ``2`` | An enumeration encoding the type of | | |
48 | | | trace. Flight Data Recorder mode | | |
49 | | | traces have type == 1 | | |
50 +-------------------+-----------------+----------------------------------------+ | |
51 | bitfield | ``4`` | Holds parameters that are not aligned | | |
52 | | | to bytes. Further described below. | | |
53 +-------------------+-----------------+----------------------------------------+ | |
54 | cycle_frequency | ``8`` | The frequency in hertz of the CPU | | |
55 | | | oscillator used to measure duration of | | |
56 | | | events in ticks. | | |
57 +-------------------+-----------------+----------------------------------------+ | |
58 | buffer_size | ``8`` | The size in bytes of the data portion | | |
59 | | | of the trace following the header. | | |
60 +-------------------+-----------------+----------------------------------------+ | |
61 | reserved | ``8`` | Reserved for future use. | | |
62 +-------------------+-----------------+----------------------------------------+ | |
63 | |
64 The bitfield parameter of the file header is composed of the following fields. | |
65 | |
66 +-------------------+----------------+-----------------------------------------+ | |
67 | Field | Size (bits) | Description | | |
68 +===================+================+=========================================+ | |
69 | constant_tsc | ``1`` | Whether the platform's timestamp | | |
70 | | | counter used to record ticks between | | |
71 | | | events ticks at a constant frequency | | |
72 | | | despite CPU frequency changes. | | |
73 | | | 0 == non-constant. 1 == constant. | | |
74 +-------------------+----------------+-----------------------------------------+ | |
75 | nonstop_tsc | ``1`` | Whether the tsc continues to count | | |
76 | | | despite whether the CPU is in a low | | |
77 | | | power state. 0 == stop. 1 == non-stop. | | |
78 +-------------------+----------------+-----------------------------------------+ | |
79 | reserved | ``30`` | Not meaningful. | | |
80 +-------------------+----------------+-----------------------------------------+ | |
81 | |
82 | |
83 Data Section | |
84 ============ | |
85 | |
86 Following the header in a trace is a data section with size matching the | |
87 buffer_size field in the header. | |
88 | |
89 The data section is a stream of elements of different types. | |
90 | |
91 There are a few categories of data in the sequence. | |
92 | |
93 - ``Function Records``: Function Records contain the timing of entry into and | |
94 exit from function execution. Function Records have 8 bytes each. | |
95 | |
96 - ``Metadata Records``: Metadata records serve many purposes. Mostly, they | |
97 capture information that may be too costly to record for each function, but | |
98 that is required to contextualize the fine-grained timings. They also are used | |
99 as markers for user-defined Event Data payloads. Metadata records have 16 | |
100 bytes each. | |
101 | |
102 - ``Event Data``: Free form data may be associated with events that are traced | |
103 by the binary and encode data defined by a handler function. Event data is | |
104 always preceded with a marker record which indicates how large it is. | |
105 | |
106 - ``Function Arguments``: The arguments to some functions are included in the | |
107 trace. These are either pointer addresses or primitives that are read and | |
108 logged independently of their types in a high level language. To the tracer, | |
109 they are all simply numbers. Function Records that have attached arguments | |
110 will indicate their presence on the function entry record. We only support | |
111 logging contiguous function argument sequences starting with argument zero, | |
112 which will be the "this" pointer for member function invocations. For example, | |
113 we don't support logging the first and third argument. | |
114 | |
115 A reader of the memory format must maintain a state machine. The format makes no | |
116 attempt to pad for alignment, and it is not seekable. | |
117 | |
118 | |
119 Function Records | |
120 ---------------- | |
121 | |
122 Function Records have an 8 byte layout. This layout encodes information to | |
123 reconstruct a call stack of instrumented function and their durations. | |
124 | |
125 +---------------+--------------+-----------------------------------------------+ | |
126 | Field | Size (bits) | Description | | |
127 +===============+==============+===============================================+ | |
128 | discriminant | ``1`` | Indicates whether a reader should read a | | |
129 | | | Function or Metadata record. Set to ``0`` for | | |
130 | | | Function records. | | |
131 +---------------+--------------+-----------------------------------------------+ | |
132 | action | ``3`` | Specifies whether the function is being | | |
133 | | | entered, exited, or is a non-standard entry | | |
134 | | | or exit produced by optimizations. | | |
135 +---------------+--------------+-----------------------------------------------+ | |
136 | function_id | ``28`` | A numeric ID for the function. Resolved to a | | |
137 | | | name via the xray instrumentation map. The | | |
138 | | | instrumentation map is built by xray at | | |
139 | | | compile time into an object file and pairs | | |
140 | | | the function ids to addresses. It is used for | | |
141 | | | patching and as a lookup into the binary's | | |
142 | | | symbols to obtain names. | | |
143 +---------------+--------------+-----------------------------------------------+ | |
144 | tsc_delta | ``32`` | The number of ticks of the timestamp counter | | |
145 | | | since a previous record recorded a delta or | | |
146 | | | other TSC resetting event. | | |
147 +---------------+--------------+-----------------------------------------------+ | |
148 | |
149 On little-endian machines, the bitfields are ordered from least significant bit | |
150 bit to most significant bit. A reader can read an 8 bit value and apply the mask | |
151 ``0x01`` for the discriminant. Similarly, they can read 32 bits and unsigned | |
152 shift right by ``0x04`` to obtain the function_id field. | |
153 | |
154 On big-endian machine, the bitfields are written in order from most significant | |
155 bit to least significant bit. A reader would read an 8 bit value and unsigned | |
156 shift right by 7 bits for the discriminant. The function_id field could be | |
157 obtained by reading a 32 bit value and applying the mask ``0x0FFFFFFF``. | |
158 | |
159 Function action types are as follows. | |
160 | |
161 +---------------+--------------+-----------------------------------------------+ | |
162 | Type | Number | Description | | |
163 +===============+==============+===============================================+ | |
164 | Entry | ``0`` | Typical function entry. | | |
165 +---------------+--------------+-----------------------------------------------+ | |
166 | Exit | ``1`` | Typical function exit. | | |
167 +---------------+--------------+-----------------------------------------------+ | |
168 | Tail_Exit | ``2`` | An exit from a function due to tail call | | |
169 | | | optimization. | | |
170 +---------------+--------------+-----------------------------------------------+ | |
171 | Entry_Args | ``3`` | A function entry that records arguments. | | |
172 +---------------+--------------+-----------------------------------------------+ | |
173 | |
174 Entry_Args records do not contain the arguments themselves. Instead, metadata | |
175 records for each of the logged args follow the function record in the stream. | |
176 | |
177 | |
178 Metadata Records | |
179 ---------------- | |
180 | |
181 Interspersed throughout the buffer are 16 byte Metadata records. For typically | |
182 instrumented binaries, they will be sparser than Function records, and they | |
183 provide a fuller picture of the binary execution state. | |
184 | |
185 Metadata record layout is partially record dependent, but they share a common | |
186 structure. | |
187 | |
188 The same bit field rules described for function records apply to the first byte | |
189 of MetadataRecords. Within this byte, little endian machines use lsb to msb | |
190 ordering and big endian machines use msb to lsb ordering. | |
191 | |
192 +---------------+--------------+-----------------------------------------------+ | |
193 | Field | Size | Description | | |
194 +===============+==============+===============================================+ | |
195 | discriminant | ``1 bit`` | Indicates whether a reader should read a | | |
196 | | | Function or Metadata record. Set to ``1`` for | | |
197 | | | Metadata records. | | |
198 +---------------+--------------+-----------------------------------------------+ | |
199 | record_kind | ``7 bits`` | The type of Metadata record. | | |
200 +---------------+--------------+-----------------------------------------------+ | |
201 | data | ``15 bytes`` | A data field used differently for each record | | |
202 | | | type. | | |
203 +---------------+--------------+-----------------------------------------------+ | |
204 | |
205 Here is a table of the enumerated record kinds. | |
206 | |
207 +--------+---------------------------+ | |
208 | Number | Type | | |
209 +========+===========================+ | |
210 | 0 | NewBuffer | | |
211 +--------+---------------------------+ | |
212 | 1 | EndOfBuffer | | |
213 +--------+---------------------------+ | |
214 | 2 | NewCPUId | | |
215 +--------+---------------------------+ | |
216 | 3 | TSCWrap | | |
217 +--------+---------------------------+ | |
218 | 4 | WallTimeMarker | | |
219 +--------+---------------------------+ | |
220 | 5 | CustomEventMarker | | |
221 +--------+---------------------------+ | |
222 | 6 | CallArgument | | |
223 +--------+---------------------------+ | |
224 | |
225 | |
226 NewBuffer Records | |
227 ----------------- | |
228 | |
229 Each buffer begins with a NewBuffer record immediately after the header. | |
230 It records the thread ID of the thread that the trace belongs to. | |
231 | |
232 Its data segment is as follows. | |
233 | |
234 +---------------+--------------+-----------------------------------------------+ | |
235 | Field | Size (bytes) | Description | | |
236 +===============+==============+===============================================+ | |
237 | thread_Id | ``2`` | Thread ID for buffer. | | |
238 +---------------+--------------+-----------------------------------------------+ | |
239 | reserved | ``13`` | Unused. | | |
240 +---------------+--------------+-----------------------------------------------+ | |
241 | |
242 | |
243 WallClockTime Records | |
244 --------------------- | |
245 | |
246 Following the NewBuffer record, each buffer records an absolute time as a frame | |
247 of reference for the durations recorded by timestamp counter deltas. | |
248 | |
249 Its data segment is as follows. | |
250 | |
251 +---------------+--------------+-----------------------------------------------+ | |
252 | Field | Size (bytes) | Description | | |
253 +===============+==============+===============================================+ | |
254 | seconds | ``8`` | Seconds on absolute timescale. The starting | | |
255 | | | point is unspecified and depends on the | | |
256 | | | implementation and platform configured by the | | |
257 | | | tracer. | | |
258 +---------------+--------------+-----------------------------------------------+ | |
259 | microseconds | ``4`` | The microsecond component of the time. | | |
260 +---------------+--------------+-----------------------------------------------+ | |
261 | reserved | ``3`` | Unused. | | |
262 +---------------+--------------+-----------------------------------------------+ | |
263 | |
264 | |
265 NewCpuId Records | |
266 ---------------- | |
267 | |
268 Each function entry invokes a routine to determine what CPU is executing. | |
269 Typically, this is done with readtscp, which reads the timestamp counter at the | |
270 same time. | |
271 | |
272 If the tracing detects that the execution has switched CPUs or if this is the | |
273 first instrumented entry point, the tracer will output a NewCpuId record. | |
274 | |
275 Its data segment is as follows. | |
276 | |
277 +---------------+--------------+-----------------------------------------------+ | |
278 | Field | Size (bytes) | Description | | |
279 +===============+==============+===============================================+ | |
280 | cpu_id | ``2`` | CPU Id. | | |
281 +---------------+--------------+-----------------------------------------------+ | |
282 | absolute_tsc | ``8`` | The absolute value of the timestamp counter. | | |
283 +---------------+--------------+-----------------------------------------------+ | |
284 | reserved | ``5`` | Unused. | | |
285 +---------------+--------------+-----------------------------------------------+ | |
286 | |
287 | |
288 TSCWrap Records | |
289 --------------- | |
290 | |
291 Since each function record uses a 32 bit value to represent the number of ticks | |
292 of the timestamp counter since the last reference, it is possible for this value | |
293 to overflow, particularly for sparsely instrumented binaries. | |
294 | |
295 When this delta would not fit into a 32 bit representation, a reference absolute | |
296 timestamp counter record is written in the form of a TSCWrap record. | |
297 | |
298 Its data segment is as follows. | |
299 | |
300 +---------------+--------------+-----------------------------------------------+ | |
301 | Field | Size (bytes) | Description | | |
302 +===============+==============+===============================================+ | |
303 | absolute_tsc | ``8`` | Timestamp counter value. | | |
304 +---------------+--------------+-----------------------------------------------+ | |
305 | reserved | ``7`` | Unused. | | |
306 +---------------+--------------+-----------------------------------------------+ | |
307 | |
308 | |
309 CallArgument Records | |
310 -------------------- | |
311 | |
312 Immediately following an Entry_Args type function record, there may be one or | |
313 more CallArgument records that contain the traced function's parameter values. | |
314 | |
315 The order of the CallArgument Record sequency corresponds one to one with the | |
316 order of the function parameters. | |
317 | |
318 CallArgument data segment: | |
319 | |
320 +---------------+--------------+-----------------------------------------------+ | |
321 | Field | Size (bytes) | Description | | |
322 +===============+==============+===============================================+ | |
323 | argument | ``8`` | Numeric argument (may be pointer address). | | |
324 +---------------+--------------+-----------------------------------------------+ | |
325 | reserved | ``7`` | Unused. | | |
326 +---------------+--------------+-----------------------------------------------+ | |
327 | |
328 | |
329 CustomEventMarker Records | |
330 ------------------------- | |
331 | |
332 XRay provides the feature of logging custom events. This may be leveraged to | |
333 record tracing info for RPCs or similarly trace data that is application | |
334 specific. | |
335 | |
336 Custom Events themselves are an unstructured (application defined) segment of | |
337 memory with arbitrary size within the buffer. They are preceded by | |
338 CustomEventMarkers to indicate their presence and size. | |
339 | |
340 CustomEventMarker data segment: | |
341 | |
342 +---------------+--------------+-----------------------------------------------+ | |
343 | Field | Size (bytes) | Description | | |
344 +===============+==============+===============================================+ | |
345 | event_size | ``4`` | Size of preceded event. | | |
346 +---------------+--------------+-----------------------------------------------+ | |
347 | absolute_tsc | ``8`` | A timestamp counter of the event. | | |
348 +---------------+--------------+-----------------------------------------------+ | |
349 | reserved | ``3`` | Unused. | | |
350 +---------------+--------------+-----------------------------------------------+ | |
351 | |
352 | |
353 EndOfBuffer Records | |
354 ------------------- | |
355 | |
356 An EndOfBuffer record type indicates that there is no more trace data in this | |
357 buffer. The reader is expected to seek past the remaining buffer_size expressed | |
358 before the start of buffer and look for either another header or EOF. | |
359 | |
360 | |
361 Format Grammar and Invariants | |
362 ============================= | |
363 | |
364 Not all sequences of Metadata records and Function records are valid data. A | |
365 sequence should be parsed as a state machine. The expectations for a valid | |
366 format can be expressed as a context free grammar. | |
367 | |
368 This is an attempt to explain the format with statements in EBNF format. | |
369 | |
370 - Format := Header ThreadBuffer* EOF | |
371 | |
372 - ThreadBuffer := NewBuffer WallClockTime NewCPUId BodySequence* End | |
373 | |
374 - BodySequence := NewCPUId | TSCWrap | Function | CustomEvent | |
375 | |
376 - Function := (Function_Entry_Args CallArgument*) | Function_Other_Type | |
377 | |
378 - CustomEvent := CustomEventMarker CustomEventUnstructuredMemory | |
379 | |
380 - End := EndOfBuffer RemainingBufferSizeToSkip | |
381 | |
382 | |
383 Function Record Order | |
384 --------------------- | |
385 | |
386 There are a few clarifications that may help understand what is expected of | |
387 Function records. | |
388 | |
389 - Functions with an Exit are expected to have a corresponding Entry or | |
390 Entry_Args function record precede them in the trace. | |
391 | |
392 - Tail_Exit Function records record the Function ID of the function whose return | |
393 address the program counter will take. In other words, the final function that | |
394 would be popped off of the call stack if tail call optimization was not used. | |
395 | |
396 - Not all functions marked for instrumentation are necessarily in the trace. The | |
397 tracer uses heuristics to preserve the trace for non-trivial functions. | |
398 | |
399 - Not every entry must have a traced Exit or Tail Exit. The buffer may run out | |
400 of space or the program may request for the tracer to finalize toreturn the | |
401 buffer before an instrumented function exits. |