150
|
1 ======================
|
|
2 Using Polly with Clang
|
|
3 ======================
|
|
4
|
|
5 This documentation discusses how Polly can be used in Clang to automatically
|
|
6 optimize C/C++ code during compilation.
|
|
7
|
|
8
|
|
9 .. warning::
|
|
10
|
|
11 Warning: clang/LLVM/Polly need to be in sync (compiled from the same SVN
|
|
12 revision).
|
|
13
|
|
14 Make Polly available from Clang
|
|
15 ===============================
|
|
16
|
|
17 Polly is available through clang, opt, and bugpoint, if Polly was checked out
|
|
18 into tools/polly before compilation. No further configuration is needed.
|
|
19
|
|
20 Optimizing with Polly
|
|
21 =====================
|
|
22
|
|
23 Optimizing with Polly is as easy as adding -O3 -mllvm -polly to your compiler
|
|
24 flags (Polly is not available unless optimizations are enabled, such as
|
|
25 -O1,-O2,-O3; Optimizing for size with -Os or -Oz is not recommended).
|
|
26
|
|
27 .. code-block:: console
|
|
28
|
|
29 clang -O3 -mllvm -polly file.c
|
|
30
|
|
31 Automatic OpenMP code generation
|
|
32 ================================
|
|
33
|
|
34 To automatically detect parallel loops and generate OpenMP code for them you
|
|
35 also need to add -mllvm -polly-parallel -lgomp to your CFLAGS.
|
|
36
|
|
37 .. code-block:: console
|
|
38
|
|
39 clang -O3 -mllvm -polly -mllvm -polly-parallel -lgomp file.c
|
|
40
|
|
41 Switching the OpenMP backend
|
|
42 ----------------------------
|
|
43
|
|
44 The following CL switch allows to choose Polly's OpenMP-backend:
|
|
45
|
|
46 -polly-omp-backend[=BACKEND]
|
|
47 choose the OpenMP backend; BACKEND can be 'GNU' (the default) or 'LLVM';
|
|
48
|
|
49 The OpenMP backends can be further influenced using the following CL switches:
|
|
50
|
|
51
|
|
52 -polly-num-threads[=NUM]
|
|
53 set the number of threads to use; NUM may be any positive integer (default: 0, which equals automatic/OMP runtime);
|
|
54
|
|
55 -polly-scheduling[=SCHED]
|
|
56 set the OpenMP scheduling type; SCHED can be 'static', 'dynamic', 'guided' or 'runtime' (the default);
|
|
57
|
|
58 -polly-scheduling-chunksize[=CHUNK]
|
|
59 set the chunksize (for the selected scheduling type); CHUNK may be any strictly positive integer (otherwise it will default to 1);
|
|
60
|
|
61 Note that at the time of writing, the GNU backend may only use the
|
|
62 `polly-num-threads` and `polly-scheduling` switches, where the latter also has
|
|
63 to be set to "runtime".
|
|
64
|
|
65 Example: Use alternative backend with dynamic scheduling, four threads and
|
|
66 chunksize of one (additional switches).
|
|
67
|
|
68 .. code-block:: console
|
|
69
|
|
70 -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=4
|
|
71 -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1
|
|
72
|
|
73 Automatic Vector code generation
|
|
74 ================================
|
|
75
|
|
76 Automatic vector code generation can be enabled by adding -mllvm
|
|
77 -polly-vectorizer=stripmine to your CFLAGS.
|
|
78
|
|
79 .. code-block:: console
|
|
80
|
|
81 clang -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine file.c
|
|
82
|
|
83 Isolate the Polly passes
|
|
84 ========================
|
|
85
|
|
86 Polly's analysis and transformation passes are run with many other
|
|
87 passes of the pass manager's pipeline. Some of passes that run before
|
|
88 Polly are essential for its working, for instance the canonicalization
|
|
89 of loop. Therefore Polly is unable to optimize code straight out of
|
|
90 clang's -O0 output.
|
|
91
|
|
92 To get the LLVM-IR that Polly sees in the optimization pipeline, use the
|
|
93 command:
|
|
94
|
|
95 .. code-block:: console
|
|
96
|
|
97 clang file.c -c -O3 -mllvm -polly -mllvm -polly-dump-before-file=before-polly.ll
|
|
98
|
|
99 This writes a file 'before-polly.ll' containing the LLVM-IR as passed to
|
|
100 polly, after SSA transformation, loop canonicalization, inlining and
|
|
101 other passes.
|
|
102
|
|
103 Thereafter, any Polly pass can be run over 'before-polly.ll' using the
|
|
104 'opt' tool. To found out which Polly passes are active in the standard
|
|
105 pipeline, see the output of
|
|
106
|
|
107 .. code-block:: console
|
|
108
|
|
109 clang file.c -c -O3 -mllvm -polly -mllvm -debug-pass=Arguments
|
|
110
|
|
111 The Polly's passes are those between '-polly-detect' and
|
|
112 '-polly-codegen'. Analysis passes can be omitted. At the time of this
|
|
113 writing, the default Polly pass pipeline is:
|
|
114
|
|
115 .. code-block:: console
|
|
116
|
|
117 opt before-polly.ll -polly-simplify -polly-optree -polly-delicm -polly-simplify -polly-prune-unprofitable -polly-opt-isl -polly-codegen
|
|
118
|
|
119 Note that this uses LLVM's old/legacy pass manager.
|
|
120
|
|
121 For completeness, here are some other methods that generates IR
|
|
122 suitable for processing with Polly from C/C++/Objective C source code.
|
|
123 The previous method is the recommended one.
|
|
124
|
|
125 The following generates unoptimized LLVM-IR ('-O0', which is the
|
|
126 default) and runs the canonicalizing passes on it
|
|
127 ('-polly-canonicalize'). This does /not/ include all the passes that run
|
|
128 before Polly in the default pass pipeline. The '-disable-O0-optnone'
|
|
129 option is required because otherwise clang adds an 'optnone' attribute
|
|
130 to all functions such that it is skipped by most optimization passes.
|
|
131 This is meant to stop LTO builds to optimize these functions in the
|
|
132 linking phase anyway.
|
|
133
|
|
134 .. code-block:: console
|
|
135
|
|
136 clang file.c -c -O0 -Xclang -disable-O0-optnone -emit-llvm -S -o - | opt -polly-canonicalize -S
|
|
137
|
|
138 The option '-disable-llvm-passes' disables all LLVM passes, even those
|
|
139 that run at -O0. Passing -O1 (or any optimization level other than -O0)
|
|
140 avoids that the 'optnone' attribute is added.
|
|
141
|
|
142 .. code-block:: console
|
|
143
|
|
144 clang file.c -c -O1 -Xclang -disable-llvm-passes -emit-llvm -S -o - | opt -polly-canonicalize -S
|
|
145
|
|
146 As another alternative, Polly can be pushed in front of the pass
|
|
147 pipeline, and then its output dumped. This implicitly runs the
|
|
148 '-polly-canonicalize' passes.
|
|
149
|
|
150 .. code-block:: console
|
|
151
|
|
152 clang file.c -c -O3 -mllvm -polly -mllvm -polly-position=early -mllvm -polly-dump-before-file=before-polly.ll
|
|
153
|
|
154 Further options
|
|
155 ===============
|
|
156 Polly supports further options that are mainly useful for the development or the
|
|
157 analysis of Polly. The relevant options can be added to clang by appending
|
|
158 -mllvm -option-name to the CFLAGS or the clang command line.
|
|
159
|
|
160 Limit Polly to a single function
|
|
161 --------------------------------
|
|
162
|
|
163 To limit the execution of Polly to a single function, use the option
|
|
164 -polly-only-func=functionname.
|
|
165
|
|
166 Disable LLVM-IR generation
|
|
167 --------------------------
|
|
168
|
|
169 Polly normally regenerates LLVM-IR from the Polyhedral representation. To only
|
|
170 see the effects of the preparing transformation, but to disable Polly code
|
|
171 generation add the option polly-no-codegen.
|
|
172
|
|
173 Graphical view of the SCoPs
|
|
174 ---------------------------
|
|
175 Polly can use graphviz to show the SCoPs it detects in a program. The relevant
|
|
176 options are -polly-show, -polly-show-only, -polly-dot and -polly-dot-only. The
|
|
177 'show' options automatically run dotty or another graphviz viewer to show the
|
|
178 scops graphically. The 'dot' options store for each function a dot file that
|
|
179 highlights the detected SCoPs. If 'only' is appended at the end of the option,
|
|
180 the basic blocks are shown without the statements the contain.
|
|
181
|
|
182 Change/Disable the Optimizer
|
|
183 ----------------------------
|
|
184
|
|
185 Polly uses by default the isl scheduling optimizer. The isl optimizer optimizes
|
|
186 for data-locality and parallelism using the Pluto algorithm.
|
|
187 To disable the optimizer entirely use the option -polly-optimizer=none.
|
|
188
|
|
189 Disable tiling in the optimizer
|
|
190 -------------------------------
|
|
191
|
|
192 By default both optimizers perform tiling, if possible. In case this is not
|
|
193 wanted the option -polly-tiling=false can be used to disable it. (This option
|
|
194 disables tiling for both optimizers).
|
|
195
|
|
196 Import / Export
|
|
197 ---------------
|
|
198
|
|
199 The flags -polly-import and -polly-export allow the export and reimport of the
|
|
200 polyhedral representation. By exporting, modifying and reimporting the
|
|
201 polyhedral representation externally calculated transformations can be
|
|
202 applied. This enables external optimizers or the manual optimization of
|
|
203 specific SCoPs.
|
|
204
|
|
205 Viewing Polly Diagnostics with opt-viewer
|
|
206 -----------------------------------------
|
|
207
|
|
208 The flag -fsave-optimization-record will generate .opt.yaml files when compiling
|
|
209 your program. These yaml files contain information about each emitted remark.
|
|
210 Ensure that you have Python 2.7 with PyYaml and Pygments Python Packages.
|
|
211 To run opt-viewer:
|
|
212
|
|
213 .. code-block:: console
|
|
214
|
|
215 llvm/tools/opt-viewer/opt-viewer.py -source-dir /path/to/program/src/ \
|
|
216 /path/to/program/src/foo.opt.yaml \
|
|
217 /path/to/program/src/bar.opt.yaml \
|
|
218 -o ./output
|
|
219
|
|
220 Include all yaml files (use \*.opt.yaml when specifying which yaml files to view)
|
|
221 to view all diagnostics from your program in opt-viewer. Compile with `PGO
|
|
222 <https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation>`_ to view
|
|
223 Hotness information in opt-viewer. Resulting html files can be viewed in an internet browser.
|