150
|
1 ===================
|
|
2 Debugging with XRay
|
|
3 ===================
|
|
4
|
|
5 This document shows an example of how you would go about analyzing applications
|
|
6 built with XRay instrumentation. Here we will attempt to debug ``llc``
|
|
7 compiling some sample LLVM IR generated by Clang.
|
|
8
|
|
9 .. contents::
|
|
10 :local:
|
|
11
|
|
12 Building with XRay
|
|
13 ------------------
|
|
14
|
|
15 To debug an application with XRay instrumentation, we need to build it with a
|
|
16 Clang that supports the ``-fxray-instrument`` option. See `XRay <XRay.html>`_
|
|
17 for more technical details of how XRay works for background information.
|
|
18
|
|
19 In our example, we need to add ``-fxray-instrument`` to the list of flags
|
|
20 passed to Clang when building a binary. Note that we need to link with Clang as
|
|
21 well to get the XRay runtime linked in appropriately. For building ``llc`` with
|
|
22 XRay, we do something similar below for our LLVM build:
|
|
23
|
|
24 ::
|
|
25
|
|
26 $ mkdir -p llvm-build && cd llvm-build
|
|
27 # Assume that the LLVM sources are at ../llvm
|
|
28 $ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \
|
|
29 -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument" -DCMAKE_CXX_FLAGS="-fxray-instrument" \
|
|
30 # Once this finishes, we should build llc
|
|
31 $ ninja llc
|
|
32
|
|
33
|
|
34 To verify that we have an XRay instrumented binary, we can use ``objdump`` to
|
|
35 look for the ``xray_instr_map`` section.
|
|
36
|
|
37 ::
|
|
38
|
|
39 $ objdump -h -j xray_instr_map ./bin/llc
|
|
40 ./bin/llc: file format elf64-x86-64
|
|
41
|
|
42 Sections:
|
|
43 Idx Name Size VMA LMA File off Algn
|
|
44 14 xray_instr_map 00002fc0 00000000041516c6 00000000041516c6 03d516c6 2**0
|
|
45 CONTENTS, ALLOC, LOAD, READONLY, DATA
|
|
46
|
|
47 Getting Traces
|
|
48 --------------
|
|
49
|
|
50 By default, XRay does not write out the trace files or patch the application
|
|
51 before main starts. If we run ``llc`` it should work like a normally built
|
|
52 binary. If we want to get a full trace of the application's operations (of the
|
|
53 functions we do end up instrumenting with XRay) then we need to enable XRay
|
|
54 at application start. To do this, XRay checks the ``XRAY_OPTIONS`` environment
|
|
55 variable.
|
|
56
|
|
57 ::
|
|
58
|
|
59 # The following doesn't create an XRay trace by default.
|
|
60 $ ./bin/llc input.ll
|
|
61
|
|
62 # We need to set the XRAY_OPTIONS to enable some features.
|
|
63 $ XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1" ./bin/llc input.ll
|
|
64 ==69819==XRay: Log file in 'xray-log.llc.m35qPB'
|
|
65
|
|
66 At this point we now have an XRay trace we can start analysing.
|
|
67
|
|
68 The ``llvm-xray`` Tool
|
|
69 ----------------------
|
|
70
|
|
71 Having a trace then allows us to do basic accounting of the functions that were
|
|
72 instrumented, and how much time we're spending in parts of the code. To make
|
|
73 sense of this data, we use the ``llvm-xray`` tool which has a few subcommands
|
|
74 to help us understand our trace.
|
|
75
|
|
76 One of the things we can do is to get an accounting of the functions that have
|
|
77 been instrumented. We can see an example accounting with ``llvm-xray account``:
|
|
78
|
|
79 ::
|
|
80
|
|
81 $ llvm-xray account xray-log.llc.m35qPB -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc
|
|
82 Functions with latencies: 29
|
|
83 funcid count [ min, med, 90p, 99p, max] sum function
|
|
84 187 360 [ 0.000000, 0.000001, 0.000014, 0.000032, 0.000075] 0.001596 LLLexer.cpp:446:0: llvm::LLLexer::LexIdentifier()
|
|
85 85 130 [ 0.000000, 0.000000, 0.000018, 0.000023, 0.000156] 0.000799 X86ISelDAGToDAG.cpp:1984:0: (anonymous namespace)::X86DAGToDAGISel::Select(llvm::SDNode*)
|
|
86 138 130 [ 0.000000, 0.000000, 0.000017, 0.000155, 0.000155] 0.000774 SelectionDAGISel.cpp:2963:0: llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int)
|
|
87 188 103 [ 0.000000, 0.000000, 0.000003, 0.000123, 0.000214] 0.000737 LLParser.cpp:2692:0: llvm::LLParser::ParseValID(llvm::ValID&, llvm::LLParser::PerFunctionState*)
|
|
88 88 1 [ 0.000562, 0.000562, 0.000562, 0.000562, 0.000562] 0.000562 X86ISelLowering.cpp:83:0: llvm::X86TargetLowering::X86TargetLowering(llvm::X86TargetMachine const&, llvm::X86Subtarget const&)
|
|
89 125 102 [ 0.000001, 0.000003, 0.000010, 0.000017, 0.000049] 0.000471 Verifier.cpp:3714:0: (anonymous namespace)::Verifier::visitInstruction(llvm::Instruction&)
|
|
90 90 8 [ 0.000023, 0.000035, 0.000106, 0.000106, 0.000106] 0.000342 X86ISelLowering.cpp:3363:0: llvm::X86TargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const
|
|
91 124 32 [ 0.000003, 0.000007, 0.000016, 0.000041, 0.000041] 0.000310 Verifier.cpp:1967:0: (anonymous namespace)::Verifier::visitFunction(llvm::Function const&)
|
|
92 123 1 [ 0.000302, 0.000302, 0.000302, 0.000302, 0.000302] 0.000302 LLVMContextImpl.cpp:54:0: llvm::LLVMContextImpl::~LLVMContextImpl()
|
|
93 139 46 [ 0.000000, 0.000002, 0.000006, 0.000008, 0.000019] 0.000138 TargetLowering.cpp:506:0: llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt&, llvm::APInt&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const
|
|
94
|
|
95 This shows us that for our input file, ``llc`` spent the most cumulative time
|
|
96 in the lexer (a total of 1 millisecond). If we wanted for example to work with
|
|
97 this data in a spreadsheet, we can output the results as CSV using the
|
|
98 ``-format=csv`` option to the command for further analysis.
|
|
99
|
|
100 If we want to get a textual representation of the raw trace we can use the
|
|
101 ``llvm-xray convert`` tool to get YAML output. The first few lines of that
|
|
102 output for an example trace would look like the following:
|
|
103
|
|
104 ::
|
|
105
|
|
106 $ llvm-xray convert -f yaml -symbolize -instr_map=./bin/llc xray-log.llc.m35qPB
|
|
107 ---
|
|
108 header:
|
|
109 version: 1
|
|
110 type: 0
|
|
111 constant-tsc: true
|
|
112 nonstop-tsc: true
|
|
113 cycle-frequency: 2601000000
|
|
114 records:
|
|
115 - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426023268520 }
|
|
116 - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426023523052 }
|
|
117 - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426029925386 }
|
|
118 - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426030031128 }
|
|
119 - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426046951388 }
|
|
120 - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047282020 }
|
|
121 - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426047857332 }
|
|
122 - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047984152 }
|
|
123 - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048036584 }
|
|
124 - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048042292 }
|
|
125 - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048055056 }
|
|
126 - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048067316 }
|
|
127
|
|
128 Controlling Fidelity
|
|
129 --------------------
|
|
130
|
|
131 So far in our examples, we haven't been getting full coverage of the functions
|
|
132 we have in the binary. To get that, we need to modify the compiler flags so
|
|
133 that we can instrument more (if not all) the functions we have in the binary.
|
|
134 We have two options for doing that, and we explore both of these below.
|
|
135
|
|
136 Instruction Threshold
|
|
137 `````````````````````
|
|
138
|
|
139 The first "blunt" way of doing this is by setting the minimum threshold for
|
|
140 function bodies to 1. We can do that with the
|
|
141 ``-fxray-instruction-threshold=N`` flag when building our binary. We rebuild
|
|
142 ``llc`` with this option and observe the results:
|
|
143
|
|
144 ::
|
|
145
|
|
146 $ rm CMakeCache.txt
|
|
147 $ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \
|
|
148 -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument -fxray-instruction-threshold=1" \
|
|
149 -DCMAKE_CXX_FLAGS="-fxray-instrument -fxray-instruction-threshold=1"
|
|
150 $ ninja llc
|
|
151 $ XRAY_OPTIONS="patch_premain=true" ./bin/llc input.ll
|
|
152 ==69819==XRay: Log file in 'xray-log.llc.5rqxkU'
|
|
153
|
|
154 $ llvm-xray account xray-log.llc.5rqxkU -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc
|
|
155 Functions with latencies: 36652
|
|
156 funcid count [ min, med, 90p, 99p, max] sum function
|
|
157 75 1 [ 0.672368, 0.672368, 0.672368, 0.672368, 0.672368] 0.672368 llc.cpp:271:0: main
|
|
158 78 1 [ 0.626455, 0.626455, 0.626455, 0.626455, 0.626455] 0.626455 llc.cpp:381:0: compileModule(char**, llvm::LLVMContext&)
|
|
159 139617 1 [ 0.472618, 0.472618, 0.472618, 0.472618, 0.472618] 0.472618 LegacyPassManager.cpp:1723:0: llvm::legacy::PassManager::run(llvm::Module&)
|
|
160 139610 1 [ 0.472618, 0.472618, 0.472618, 0.472618, 0.472618] 0.472618 LegacyPassManager.cpp:1681:0: llvm::legacy::PassManagerImpl::run(llvm::Module&)
|
|
161 139612 1 [ 0.470948, 0.470948, 0.470948, 0.470948, 0.470948] 0.470948 LegacyPassManager.cpp:1564:0: (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&)
|
|
162 139607 2 [ 0.147345, 0.315994, 0.315994, 0.315994, 0.315994] 0.463340 LegacyPassManager.cpp:1530:0: llvm::FPPassManager::runOnModule(llvm::Module&)
|
|
163 139605 21 [ 0.000002, 0.000002, 0.102593, 0.213336, 0.213336] 0.463331 LegacyPassManager.cpp:1491:0: llvm::FPPassManager::runOnFunction(llvm::Function&)
|
|
164 139563 26096 [ 0.000002, 0.000002, 0.000037, 0.000063, 0.000215] 0.225708 LegacyPassManager.cpp:1083:0: llvm::PMDataManager::findAnalysisPass(void const*, bool)
|
|
165 108055 188 [ 0.000002, 0.000120, 0.001375, 0.004523, 0.062624] 0.159279 MachineFunctionPass.cpp:38:0: llvm::MachineFunctionPass::runOnFunction(llvm::Function&)
|
|
166 62635 22 [ 0.000041, 0.000046, 0.000050, 0.126744, 0.126744] 0.127715 X86TargetMachine.cpp:242:0: llvm::X86TargetMachine::getSubtargetImpl(llvm::Function const&) const
|
|
167
|
|
168
|
|
169 Instrumentation Attributes
|
|
170 ``````````````````````````
|
|
171
|
|
172 The other way is to use configuration files for selecting which functions
|
|
173 should always be instrumented by the compiler. This gives us a way of ensuring
|
|
174 that certain functions are either always or never instrumented by not having to
|
|
175 add the attribute to the source.
|
|
176
|
|
177 To use this feature, you can define one file for the functions to always
|
|
178 instrument, and another for functions to never instrument. The format of these
|
|
179 files are exactly the same as the SanitizerLists files that control similar
|
|
180 things for the sanitizer implementations. For example:
|
|
181
|
|
182 ::
|
|
183
|
|
184 # xray-attr-list.txt
|
|
185 # always instrument functions that match the following filters:
|
|
186 [always]
|
|
187 fun:main
|
|
188
|
|
189 # never instrument functions that match the following filters:
|
|
190 [never]
|
|
191 fun:__cxx_*
|
|
192
|
|
193 Given the file above we can re-build by providing it to the
|
|
194 ``-fxray-attr-list=`` flag to clang. You can have multiple files, each defining
|
|
195 different sets of attribute sets, to be combined into a single list by clang.
|
|
196
|
|
197 The XRay stack tool
|
|
198 -------------------
|
|
199
|
|
200 Given a trace, and optionally an instrumentation map, the ``llvm-xray stack``
|
|
201 command can be used to analyze a call stack graph constructed from the function
|
|
202 call timeline.
|
|
203
|
|
204 The way to use the command is to output the top stacks by call count and time spent.
|
|
205
|
|
206 ::
|
|
207
|
|
208 $ llvm-xray stack xray-log.llc.5rqxkU -instr_map ./bin/llc
|
|
209
|
|
210 Unique Stacks: 3069
|
|
211 Top 10 Stacks by leaf sum:
|
|
212
|
|
213 Sum: 9633790
|
|
214 lvl function count sum
|
|
215 #0 main 1 58421550
|
|
216 #1 compileModule(char**, llvm::LLVMContext&) 1 51440360
|
|
217 #2 llvm::legacy::PassManagerImpl::run(llvm::Module&) 1 40535375
|
|
218 #3 llvm::FPPassManager::runOnModule(llvm::Module&) 2 39337525
|
|
219 #4 llvm::FPPassManager::runOnFunction(llvm::Function&) 6 39331465
|
|
220 #5 llvm::PMDataManager::verifyPreservedAnalysis(llvm::Pass*) 399 16628590
|
|
221 #6 llvm::PMTopLevelManager::findAnalysisPass(void const*) 4584 15155600
|
|
222 #7 llvm::PMDataManager::findAnalysisPass(void const*, bool) 32088 9633790
|
|
223
|
|
224 ..etc..
|
|
225
|
|
226 In the default mode, identical stacks on different threads are independently
|
|
227 aggregated. In a multithreaded program, you may end up having identical call
|
|
228 stacks fill your list of top calls.
|
|
229
|
|
230 To address this, you may specify the ``-aggregate-threads`` or
|
|
231 ``-per-thread-stacks`` flags. ``-per-thread-stacks`` treats the thread id as an
|
|
232 implicit root in each call stack tree, while ``-aggregate-threads`` combines
|
|
233 identical stacks from all threads.
|
|
234
|
|
235 Flame Graph Generation
|
|
236 ----------------------
|
|
237
|
|
238 The ``llvm-xray stack`` tool may also be used to generate flamegraphs for
|
|
239 visualizing your instrumented invocations. The tool does not generate the graphs
|
|
240 themselves, but instead generates a format that can be used with Brendan Gregg's
|
|
241 FlameGraph tool, currently available on `github
|
|
242 <https://github.com/brendangregg/FlameGraph>`_.
|
|
243
|
|
244 To generate output for a flamegraph, a few more options are necessary.
|
|
245
|
|
246 - ``-all-stacks`` - Emits all of the stacks.
|
|
247 - ``-stack-format`` - Choose the flamegraph output format 'flame'.
|
|
248 - ``-aggregation-type`` - Choose the metric to graph.
|
|
249
|
|
250 You may pipe the command output directly to the flamegraph tool to obtain an
|
|
251 svg file.
|
|
252
|
|
253 ::
|
|
254
|
|
255 $llvm-xray stack xray-log.llc.5rqxkU -instr_map ./bin/llc -stack-format=flame -aggregation-type=time -all-stacks | \
|
|
256 /path/to/FlameGraph/flamegraph.pl > flamegraph.svg
|
|
257
|
|
258 If you open the svg in a browser, mouse events allow exploring the call stacks.
|
|
259
|
|
260 Chrome Trace Viewer Visualization
|
|
261 ---------------------------------
|
|
262
|
|
263 We can also generate a trace which can be loaded by the Chrome Trace Viewer
|
|
264 from the same generated trace:
|
|
265
|
|
266 ::
|
|
267
|
|
268 $ llvm-xray convert -symbolize -instr_map=./bin/llc \
|
|
269 -output-format=trace_event xray-log.llc.5rqxkU \
|
|
270 | gzip > llc-trace.txt.gz
|
|
271
|
|
272 From a Chrome browser, navigating to ``chrome:///tracing`` allows us to load
|
|
273 the ``sample-trace.txt.gz`` file to visualize the execution trace.
|
|
274
|
|
275 Further Exploration
|
|
276 -------------------
|
|
277
|
|
278 The ``llvm-xray`` tool has a few other subcommands that are in various stages
|
|
279 of being developed. One interesting subcommand that can highlight a few
|
|
280 interesting things is the ``graph`` subcommand. Given for example the following
|
|
281 toy program that we build with XRay instrumentation, we can see how the
|
|
282 generated graph may be a helpful indicator of where time is being spent for the
|
|
283 application.
|
|
284
|
|
285 .. code-block:: c++
|
|
286
|
|
287 // sample.cc
|
|
288 #include <iostream>
|
|
289 #include <thread>
|
|
290
|
|
291 [[clang::xray_always_instrument]] void f() {
|
|
292 std::cerr << '.';
|
|
293 }
|
|
294
|
|
295 [[clang::xray_always_instrument]] void g() {
|
|
296 for (int i = 0; i < 1 << 10; ++i) {
|
|
297 std::cerr << '-';
|
|
298 }
|
|
299 }
|
|
300
|
|
301 int main(int argc, char* argv[]) {
|
|
302 std::thread t1([] {
|
|
303 for (int i = 0; i < 1 << 10; ++i)
|
|
304 f();
|
|
305 });
|
|
306 std::thread t2([] {
|
|
307 g();
|
|
308 });
|
|
309 t1.join();
|
|
310 t2.join();
|
|
311 std::cerr << '\n';
|
|
312 }
|
|
313
|
|
314 We then build the above with XRay instrumentation:
|
|
315
|
|
316 ::
|
|
317
|
|
318 $ clang++ -o sample -O3 sample.cc -std=c++11 -fxray-instrument -fxray-instruction-threshold=1
|
|
319 $ XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic" ./sample
|
|
320
|
|
321 We can then explore the graph rendering of the trace generated by this sample
|
|
322 application. We assume you have the graphviz toosl available in your system,
|
|
323 including both ``unflatten`` and ``dot``. If you prefer rendering or exploring
|
|
324 the graph using another tool, then that should be feasible as well. ``llvm-xray
|
|
325 graph`` will create DOT format graphs which should be usable in most graph
|
|
326 rendering applications. One example invocation of the ``llvm-xray graph``
|
|
327 command should yield some interesting insights to the workings of C++
|
|
328 applications:
|
|
329
|
|
330 ::
|
|
331
|
|
332 $ llvm-xray graph xray-log.sample.* -m sample -color-edges=sum -edge-label=sum \
|
|
333 | unflatten -f -l10 | dot -Tsvg -o sample.svg
|
|
334
|
|
335
|
|
336 Next Steps
|
|
337 ----------
|
|
338
|
|
339 If you have some interesting analyses you'd like to implement as part of the
|
|
340 llvm-xray tool, please feel free to propose them on the llvm-dev@ mailing list.
|
|
341 The following are some ideas to inspire you in getting involved and potentially
|
|
342 making things better.
|
|
343
|
|
344 - Implement a query/filtering library that allows for finding patterns in the
|
|
345 XRay traces.
|
|
346 - Collecting function call stacks and how often they're encountered in the
|
|
347 XRay trace.
|
|
348
|
|
349
|