Mercurial > hg > CbC > CbC_llvm
diff llvm/docs/XRayExample.rst @ 150:1d019706d866
LLVM10
author | anatofuz |
---|---|
date | Thu, 13 Feb 2020 15:10:13 +0900 |
parents | |
children | 2e18cbf3894f |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/llvm/docs/XRayExample.rst Thu Feb 13 15:10:13 2020 +0900 @@ -0,0 +1,349 @@ +=================== +Debugging with XRay +=================== + +This document shows an example of how you would go about analyzing applications +built with XRay instrumentation. Here we will attempt to debug ``llc`` +compiling some sample LLVM IR generated by Clang. + +.. contents:: + :local: + +Building with XRay +------------------ + +To debug an application with XRay instrumentation, we need to build it with a +Clang that supports the ``-fxray-instrument`` option. See `XRay <XRay.html>`_ +for more technical details of how XRay works for background information. + +In our example, we need to add ``-fxray-instrument`` to the list of flags +passed to Clang when building a binary. Note that we need to link with Clang as +well to get the XRay runtime linked in appropriately. For building ``llc`` with +XRay, we do something similar below for our LLVM build: + +:: + + $ mkdir -p llvm-build && cd llvm-build + # Assume that the LLVM sources are at ../llvm + $ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \ + -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument" -DCMAKE_CXX_FLAGS="-fxray-instrument" \ + # Once this finishes, we should build llc + $ ninja llc + + +To verify that we have an XRay instrumented binary, we can use ``objdump`` to +look for the ``xray_instr_map`` section. + +:: + + $ objdump -h -j xray_instr_map ./bin/llc + ./bin/llc: file format elf64-x86-64 + + Sections: + Idx Name Size VMA LMA File off Algn + 14 xray_instr_map 00002fc0 00000000041516c6 00000000041516c6 03d516c6 2**0 + CONTENTS, ALLOC, LOAD, READONLY, DATA + +Getting Traces +-------------- + +By default, XRay does not write out the trace files or patch the application +before main starts. If we run ``llc`` it should work like a normally built +binary. If we want to get a full trace of the application's operations (of the +functions we do end up instrumenting with XRay) then we need to enable XRay +at application start. To do this, XRay checks the ``XRAY_OPTIONS`` environment +variable. + +:: + + # The following doesn't create an XRay trace by default. + $ ./bin/llc input.ll + + # We need to set the XRAY_OPTIONS to enable some features. + $ XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic verbosity=1" ./bin/llc input.ll + ==69819==XRay: Log file in 'xray-log.llc.m35qPB' + +At this point we now have an XRay trace we can start analysing. + +The ``llvm-xray`` Tool +---------------------- + +Having a trace then allows us to do basic accounting of the functions that were +instrumented, and how much time we're spending in parts of the code. To make +sense of this data, we use the ``llvm-xray`` tool which has a few subcommands +to help us understand our trace. + +One of the things we can do is to get an accounting of the functions that have +been instrumented. We can see an example accounting with ``llvm-xray account``: + +:: + + $ llvm-xray account xray-log.llc.m35qPB -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc + Functions with latencies: 29 + funcid count [ min, med, 90p, 99p, max] sum function + 187 360 [ 0.000000, 0.000001, 0.000014, 0.000032, 0.000075] 0.001596 LLLexer.cpp:446:0: llvm::LLLexer::LexIdentifier() + 85 130 [ 0.000000, 0.000000, 0.000018, 0.000023, 0.000156] 0.000799 X86ISelDAGToDAG.cpp:1984:0: (anonymous namespace)::X86DAGToDAGISel::Select(llvm::SDNode*) + 138 130 [ 0.000000, 0.000000, 0.000017, 0.000155, 0.000155] 0.000774 SelectionDAGISel.cpp:2963:0: llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) + 188 103 [ 0.000000, 0.000000, 0.000003, 0.000123, 0.000214] 0.000737 LLParser.cpp:2692:0: llvm::LLParser::ParseValID(llvm::ValID&, llvm::LLParser::PerFunctionState*) + 88 1 [ 0.000562, 0.000562, 0.000562, 0.000562, 0.000562] 0.000562 X86ISelLowering.cpp:83:0: llvm::X86TargetLowering::X86TargetLowering(llvm::X86TargetMachine const&, llvm::X86Subtarget const&) + 125 102 [ 0.000001, 0.000003, 0.000010, 0.000017, 0.000049] 0.000471 Verifier.cpp:3714:0: (anonymous namespace)::Verifier::visitInstruction(llvm::Instruction&) + 90 8 [ 0.000023, 0.000035, 0.000106, 0.000106, 0.000106] 0.000342 X86ISelLowering.cpp:3363:0: llvm::X86TargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const + 124 32 [ 0.000003, 0.000007, 0.000016, 0.000041, 0.000041] 0.000310 Verifier.cpp:1967:0: (anonymous namespace)::Verifier::visitFunction(llvm::Function const&) + 123 1 [ 0.000302, 0.000302, 0.000302, 0.000302, 0.000302] 0.000302 LLVMContextImpl.cpp:54:0: llvm::LLVMContextImpl::~LLVMContextImpl() + 139 46 [ 0.000000, 0.000002, 0.000006, 0.000008, 0.000019] 0.000138 TargetLowering.cpp:506:0: llvm::TargetLowering::SimplifyDemandedBits(llvm::SDValue, llvm::APInt const&, llvm::APInt&, llvm::APInt&, llvm::TargetLowering::TargetLoweringOpt&, unsigned int, bool) const + +This shows us that for our input file, ``llc`` spent the most cumulative time +in the lexer (a total of 1 millisecond). If we wanted for example to work with +this data in a spreadsheet, we can output the results as CSV using the +``-format=csv`` option to the command for further analysis. + +If we want to get a textual representation of the raw trace we can use the +``llvm-xray convert`` tool to get YAML output. The first few lines of that +output for an example trace would look like the following: + +:: + + $ llvm-xray convert -f yaml -symbolize -instr_map=./bin/llc xray-log.llc.m35qPB + --- + header: + version: 1 + type: 0 + constant-tsc: true + nonstop-tsc: true + cycle-frequency: 2601000000 + records: + - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426023268520 } + - { type: 0, func-id: 110, function: __cxx_global_var_init.8, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426023523052 } + - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426029925386 } + - { type: 0, func-id: 164, function: __cxx_global_var_init, cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426030031128 } + - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426046951388 } + - { type: 0, func-id: 142, function: '(anonymous namespace)::CommandLineParser::ParseCommandLineOptions(int, char const* const*, llvm::StringRef, llvm::raw_ostream*)', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047282020 } + - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426047857332 } + - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426047984152 } + - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048036584 } + - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048042292 } + - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-enter, tsc: 5434426048055056 } + - { type: 0, func-id: 187, function: 'llvm::LLLexer::LexIdentifier()', cpu: 37, thread: 69819, kind: function-exit, tsc: 5434426048067316 } + +Controlling Fidelity +-------------------- + +So far in our examples, we haven't been getting full coverage of the functions +we have in the binary. To get that, we need to modify the compiler flags so +that we can instrument more (if not all) the functions we have in the binary. +We have two options for doing that, and we explore both of these below. + +Instruction Threshold +````````````````````` + +The first "blunt" way of doing this is by setting the minimum threshold for +function bodies to 1. We can do that with the +``-fxray-instruction-threshold=N`` flag when building our binary. We rebuild +``llc`` with this option and observe the results: + +:: + + $ rm CMakeCache.txt + $ cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Release \ + -DCMAKE_C_FLAGS_RELEASE="-fxray-instrument -fxray-instruction-threshold=1" \ + -DCMAKE_CXX_FLAGS="-fxray-instrument -fxray-instruction-threshold=1" + $ ninja llc + $ XRAY_OPTIONS="patch_premain=true" ./bin/llc input.ll + ==69819==XRay: Log file in 'xray-log.llc.5rqxkU' + + $ llvm-xray account xray-log.llc.5rqxkU -top=10 -sort=sum -sortorder=dsc -instr_map ./bin/llc + Functions with latencies: 36652 + funcid count [ min, med, 90p, 99p, max] sum function + 75 1 [ 0.672368, 0.672368, 0.672368, 0.672368, 0.672368] 0.672368 llc.cpp:271:0: main + 78 1 [ 0.626455, 0.626455, 0.626455, 0.626455, 0.626455] 0.626455 llc.cpp:381:0: compileModule(char**, llvm::LLVMContext&) + 139617 1 [ 0.472618, 0.472618, 0.472618, 0.472618, 0.472618] 0.472618 LegacyPassManager.cpp:1723:0: llvm::legacy::PassManager::run(llvm::Module&) + 139610 1 [ 0.472618, 0.472618, 0.472618, 0.472618, 0.472618] 0.472618 LegacyPassManager.cpp:1681:0: llvm::legacy::PassManagerImpl::run(llvm::Module&) + 139612 1 [ 0.470948, 0.470948, 0.470948, 0.470948, 0.470948] 0.470948 LegacyPassManager.cpp:1564:0: (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) + 139607 2 [ 0.147345, 0.315994, 0.315994, 0.315994, 0.315994] 0.463340 LegacyPassManager.cpp:1530:0: llvm::FPPassManager::runOnModule(llvm::Module&) + 139605 21 [ 0.000002, 0.000002, 0.102593, 0.213336, 0.213336] 0.463331 LegacyPassManager.cpp:1491:0: llvm::FPPassManager::runOnFunction(llvm::Function&) + 139563 26096 [ 0.000002, 0.000002, 0.000037, 0.000063, 0.000215] 0.225708 LegacyPassManager.cpp:1083:0: llvm::PMDataManager::findAnalysisPass(void const*, bool) + 108055 188 [ 0.000002, 0.000120, 0.001375, 0.004523, 0.062624] 0.159279 MachineFunctionPass.cpp:38:0: llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 62635 22 [ 0.000041, 0.000046, 0.000050, 0.126744, 0.126744] 0.127715 X86TargetMachine.cpp:242:0: llvm::X86TargetMachine::getSubtargetImpl(llvm::Function const&) const + + +Instrumentation Attributes +`````````````````````````` + +The other way is to use configuration files for selecting which functions +should always be instrumented by the compiler. This gives us a way of ensuring +that certain functions are either always or never instrumented by not having to +add the attribute to the source. + +To use this feature, you can define one file for the functions to always +instrument, and another for functions to never instrument. The format of these +files are exactly the same as the SanitizerLists files that control similar +things for the sanitizer implementations. For example: + +:: + + # xray-attr-list.txt + # always instrument functions that match the following filters: + [always] + fun:main + + # never instrument functions that match the following filters: + [never] + fun:__cxx_* + +Given the file above we can re-build by providing it to the +``-fxray-attr-list=`` flag to clang. You can have multiple files, each defining +different sets of attribute sets, to be combined into a single list by clang. + +The XRay stack tool +------------------- + +Given a trace, and optionally an instrumentation map, the ``llvm-xray stack`` +command can be used to analyze a call stack graph constructed from the function +call timeline. + +The way to use the command is to output the top stacks by call count and time spent. + +:: + + $ llvm-xray stack xray-log.llc.5rqxkU -instr_map ./bin/llc + + Unique Stacks: 3069 + Top 10 Stacks by leaf sum: + + Sum: 9633790 + lvl function count sum + #0 main 1 58421550 + #1 compileModule(char**, llvm::LLVMContext&) 1 51440360 + #2 llvm::legacy::PassManagerImpl::run(llvm::Module&) 1 40535375 + #3 llvm::FPPassManager::runOnModule(llvm::Module&) 2 39337525 + #4 llvm::FPPassManager::runOnFunction(llvm::Function&) 6 39331465 + #5 llvm::PMDataManager::verifyPreservedAnalysis(llvm::Pass*) 399 16628590 + #6 llvm::PMTopLevelManager::findAnalysisPass(void const*) 4584 15155600 + #7 llvm::PMDataManager::findAnalysisPass(void const*, bool) 32088 9633790 + + ..etc.. + +In the default mode, identical stacks on different threads are independently +aggregated. In a multithreaded program, you may end up having identical call +stacks fill your list of top calls. + +To address this, you may specify the ``-aggregate-threads`` or +``-per-thread-stacks`` flags. ``-per-thread-stacks`` treats the thread id as an +implicit root in each call stack tree, while ``-aggregate-threads`` combines +identical stacks from all threads. + +Flame Graph Generation +---------------------- + +The ``llvm-xray stack`` tool may also be used to generate flamegraphs for +visualizing your instrumented invocations. The tool does not generate the graphs +themselves, but instead generates a format that can be used with Brendan Gregg's +FlameGraph tool, currently available on `github +<https://github.com/brendangregg/FlameGraph>`_. + +To generate output for a flamegraph, a few more options are necessary. + +- ``-all-stacks`` - Emits all of the stacks. +- ``-stack-format`` - Choose the flamegraph output format 'flame'. +- ``-aggregation-type`` - Choose the metric to graph. + +You may pipe the command output directly to the flamegraph tool to obtain an +svg file. + +:: + + $llvm-xray stack xray-log.llc.5rqxkU -instr_map ./bin/llc -stack-format=flame -aggregation-type=time -all-stacks | \ + /path/to/FlameGraph/flamegraph.pl > flamegraph.svg + +If you open the svg in a browser, mouse events allow exploring the call stacks. + +Chrome Trace Viewer Visualization +--------------------------------- + +We can also generate a trace which can be loaded by the Chrome Trace Viewer +from the same generated trace: + +:: + + $ llvm-xray convert -symbolize -instr_map=./bin/llc \ + -output-format=trace_event xray-log.llc.5rqxkU \ + | gzip > llc-trace.txt.gz + +From a Chrome browser, navigating to ``chrome:///tracing`` allows us to load +the ``sample-trace.txt.gz`` file to visualize the execution trace. + +Further Exploration +------------------- + +The ``llvm-xray`` tool has a few other subcommands that are in various stages +of being developed. One interesting subcommand that can highlight a few +interesting things is the ``graph`` subcommand. Given for example the following +toy program that we build with XRay instrumentation, we can see how the +generated graph may be a helpful indicator of where time is being spent for the +application. + +.. code-block:: c++ + + // sample.cc + #include <iostream> + #include <thread> + + [[clang::xray_always_instrument]] void f() { + std::cerr << '.'; + } + + [[clang::xray_always_instrument]] void g() { + for (int i = 0; i < 1 << 10; ++i) { + std::cerr << '-'; + } + } + + int main(int argc, char* argv[]) { + std::thread t1([] { + for (int i = 0; i < 1 << 10; ++i) + f(); + }); + std::thread t2([] { + g(); + }); + t1.join(); + t2.join(); + std::cerr << '\n'; + } + +We then build the above with XRay instrumentation: + +:: + + $ clang++ -o sample -O3 sample.cc -std=c++11 -fxray-instrument -fxray-instruction-threshold=1 + $ XRAY_OPTIONS="patch_premain=true xray_mode=xray-basic" ./sample + +We can then explore the graph rendering of the trace generated by this sample +application. We assume you have the graphviz toosl available in your system, +including both ``unflatten`` and ``dot``. If you prefer rendering or exploring +the graph using another tool, then that should be feasible as well. ``llvm-xray +graph`` will create DOT format graphs which should be usable in most graph +rendering applications. One example invocation of the ``llvm-xray graph`` +command should yield some interesting insights to the workings of C++ +applications: + +:: + + $ llvm-xray graph xray-log.sample.* -m sample -color-edges=sum -edge-label=sum \ + | unflatten -f -l10 | dot -Tsvg -o sample.svg + + +Next Steps +---------- + +If you have some interesting analyses you'd like to implement as part of the +llvm-xray tool, please feel free to propose them on the llvm-dev@ mailing list. +The following are some ideas to inspire you in getting involved and potentially +making things better. + + - Implement a query/filtering library that allows for finding patterns in the + XRay traces. + - Collecting function call stacks and how often they're encountered in the + XRay trace. + +