147
|
1 =============================================================
|
|
2 How To Build Clang and LLVM with Profile-Guided Optimizations
|
|
3 =============================================================
|
|
4
|
|
5 Introduction
|
|
6 ============
|
|
7
|
|
8 PGO (Profile-Guided Optimization) allows your compiler to better optimize code
|
|
9 for how it actually runs. Users report that applying this to Clang and LLVM can
|
|
10 decrease overall compile time by 20%.
|
|
11
|
|
12 This guide walks you through how to build Clang with PGO, though it also applies
|
|
13 to other subprojects, such as LLD.
|
|
14
|
|
15
|
|
16 Using the script
|
|
17 ================
|
|
18
|
|
19 We have a script at ``utils/collect_and_build_with_pgo.py``. This script is
|
|
20 tested on a few Linux flavors, and requires a checkout of LLVM, Clang, and
|
|
21 compiler-rt. Despite the the name, it performs four clean builds of Clang, so it
|
|
22 can take a while to run to completion. Please see the script's ``--help`` for
|
|
23 more information on how to run it, and the different options available to you.
|
|
24 If you want to get the most out of PGO for a particular use-case (e.g. compiling
|
|
25 a specific large piece of software), please do read the section below on
|
|
26 'benchmark' selection.
|
|
27
|
|
28 Please note that this script is only tested on a few Linux distros. Patches to
|
|
29 add support for other platforms, as always, are highly appreciated. :)
|
|
30
|
|
31 This script also supports a ``--dry-run`` option, which causes it to print
|
|
32 important commands instead of running them.
|
|
33
|
|
34
|
|
35 Selecting 'benchmarks'
|
|
36 ======================
|
|
37
|
|
38 PGO does best when the profiles gathered represent how the user plans to use the
|
|
39 compiler. Notably, highly accurate profiles of llc building x86_64 code aren't
|
|
40 incredibly helpful if you're going to be targeting ARM.
|
|
41
|
|
42 By default, the script above does two things to get solid coverage. It:
|
|
43
|
|
44 - runs all of Clang and LLVM's lit tests, and
|
|
45 - uses the instrumented Clang to build Clang, LLVM, and all of the other
|
|
46 LLVM subprojects available to it.
|
|
47
|
|
48 Together, these should give you:
|
|
49
|
|
50 - solid coverage of building C++,
|
|
51 - good coverage of building C,
|
|
52 - great coverage of running optimizations,
|
|
53 - great coverage of the backend for your host's architecture, and
|
|
54 - some coverage of other architectures (if other arches are supported backends).
|
|
55
|
|
56 Altogether, this should cover a diverse set of uses for Clang and LLVM. If you
|
|
57 have very specific needs (e.g. your compiler is meant to compile a large browser
|
|
58 for four different platforms, or similar), you may want to do something else.
|
|
59 This is configurable in the script itself.
|
|
60
|
|
61
|
|
62 Building Clang with PGO
|
|
63 =======================
|
|
64
|
|
65 If you prefer to not use the script, this briefly goes over how to build
|
|
66 Clang/LLVM with PGO.
|
|
67
|
|
68 First, you should have at least LLVM, Clang, and compiler-rt checked out
|
|
69 locally.
|
|
70
|
|
71 Next, at a high level, you're going to need to do the following:
|
|
72
|
|
73 1. Build a standard Release Clang and the relevant libclang_rt.profile library
|
|
74 2. Build Clang using the Clang you built above, but with instrumentation
|
|
75 3. Use the instrumented Clang to generate profiles, which consists of two steps:
|
|
76
|
|
77 - Running the instrumented Clang/LLVM/lld/etc. on tasks that represent how
|
|
78 users will use said tools.
|
|
79 - Using a tool to convert the "raw" profiles generated above into a single,
|
|
80 final PGO profile.
|
|
81
|
|
82 4. Build a final release Clang (along with whatever other binaries you need)
|
|
83 using the profile collected from your benchmark
|
|
84
|
|
85 In more detailed steps:
|
|
86
|
|
87 1. Configure a Clang build as you normally would. It's highly recommended that
|
|
88 you use the Release configuration for this, since it will be used to build
|
|
89 another Clang. Because you need Clang and supporting libraries, you'll want
|
|
90 to build the ``all`` target (e.g. ``ninja all`` or ``make -j4 all``).
|
|
91
|
|
92 2. Configure a Clang build as above, but add the following CMake args:
|
|
93
|
|
94 - ``-DLLVM_BUILD_INSTRUMENTED=IR`` -- This causes us to build everything
|
|
95 with instrumentation.
|
|
96 - ``-DLLVM_BUILD_RUNTIME=No`` -- A few projects have bad interactions when
|
|
97 built with profiling, and aren't necessary to build. This flag turns them
|
|
98 off.
|
|
99 - ``-DCMAKE_C_COMPILER=/path/to/stage1/clang`` - Use the Clang we built in
|
|
100 step 1.
|
|
101 - ``-DCMAKE_CXX_COMPILER=/path/to/stage1/clang++`` - Same as above.
|
|
102
|
|
103 In this build directory, you simply need to build the ``clang`` target (and
|
|
104 whatever supporting tooling your benchmark requires).
|
|
105
|
|
106 3. As mentioned above, this has two steps: gathering profile data, and then
|
|
107 massaging it into a useful form:
|
|
108
|
|
109 a. Build your benchmark using the Clang generated in step 2. The 'standard'
|
|
110 benchmark recommended is to run ``check-clang`` and ``check-llvm`` in your
|
|
111 instrumented Clang's build directory, and to do a full build of Clang/LLVM
|
|
112 using your instrumented Clang. So, create yet another build directory,
|
|
113 with the following CMake arguments:
|
|
114
|
|
115 - ``-DCMAKE_C_COMPILER=/path/to/stage2/clang`` - Use the Clang we built in
|
|
116 step 2.
|
|
117 - ``-DCMAKE_CXX_COMPILER=/path/to/stage2/clang++`` - Same as above.
|
|
118
|
|
119 If your users are fans of debug info, you may want to consider using
|
|
120 ``-DCMAKE_BUILD_TYPE=RelWithDebInfo`` instead of
|
|
121 ``-DCMAKE_BUILD_TYPE=Release``. This will grant better coverage of
|
|
122 debug info pieces of clang, but will take longer to complete and will
|
|
123 result in a much larger build directory.
|
|
124
|
|
125 It's recommended to build the ``all`` target with your instrumented Clang,
|
|
126 since more coverage is often better.
|
|
127
|
|
128 b. You should now have a few ``*.profraw`` files in
|
|
129 ``path/to/stage2/profiles/``. You need to merge these using
|
|
130 ``llvm-profdata`` (even if you only have one! The profile merge transforms
|
|
131 profraw into actual profile data, as well). This can be done with
|
|
132 ``/path/to/stage1/llvm-profdata merge
|
|
133 -output=/path/to/output/profdata.prof path/to/stage2/profiles/*.profraw``.
|
|
134
|
|
135 4. Now, build your final, PGO-optimized Clang. To do this, you'll want to pass
|
|
136 the following additional arguments to CMake.
|
|
137
|
|
138 - ``-DLLVM_PROFDATA_FILE=/path/to/output/profdata.prof`` - Use the PGO
|
|
139 profile from the previous step.
|
|
140 - ``-DCMAKE_C_COMPILER=/path/to/stage1/clang`` - Use the Clang we built in
|
|
141 step 1.
|
|
142 - ``-DCMAKE_CXX_COMPILER=/path/to/stage1/clang++`` - Same as above.
|
|
143
|
|
144 From here, you can build whatever targets you need.
|
|
145
|
|
146 .. note::
|
|
147 You may see warnings about a mismatched profile in the build output. These
|
|
148 are generally harmless. To silence them, you can add
|
|
149 ``-DCMAKE_C_FLAGS='-Wno-backend-plugin'
|
|
150 -DCMAKE_CXX_FLAGS='-Wno-backend-plugin'`` to your CMake invocation.
|
|
151
|
|
152
|
|
153 Congrats! You now have a Clang built with profile-guided optimizations, and you
|
|
154 can delete all but the final build directory if you'd like.
|
|
155
|
|
156 If this worked well for you and you plan on doing it often, there's a slight
|
|
157 optimization that can be made: LLVM and Clang have a tool called tblgen that's
|
|
158 built and run during the build process. While it's potentially nice to build
|
|
159 this for coverage as part of step 3, none of your other builds should benefit
|
|
160 from building it. You can pass the CMake options
|
|
161 ``-DCLANG_TABLEGEN=/path/to/stage1/bin/clang-tblgen
|
|
162 -DLLVM_TABLEGEN=/path/to/stage1/bin/llvm-tblgen`` to steps 2 and onward to avoid
|
|
163 these useless rebuilds.
|