Mercurial > hg > CbC > CbC_gcc
annotate gcc/doc/passes.texi @ 63:b7f97abdc517 gcc-4.6-20100522
update gcc from gcc-4.5.0 to gcc-4.6
author | ryoma <e075725@ie.u-ryukyu.ac.jp> |
---|---|
date | Mon, 24 May 2010 12:47:05 +0900 |
parents | 77e2b8dfacca |
children | f6334be47118 |
rev | line source |
---|---|
0 | 1 @c markers: CROSSREF BUG TODO |
2 | |
3 @c Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, | |
4 @c 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free Software | |
5 @c Foundation, Inc. | |
6 @c This is part of the GCC manual. | |
7 @c For copying conditions, see the file gcc.texi. | |
8 | |
9 @node Passes | |
10 @chapter Passes and Files of the Compiler | |
11 @cindex passes and files of the compiler | |
12 @cindex files and passes of the compiler | |
13 @cindex compiler passes and files | |
14 | |
15 This chapter is dedicated to giving an overview of the optimization and | |
16 code generation passes of the compiler. In the process, it describes | |
17 some of the language front end interface, though this description is no | |
18 where near complete. | |
19 | |
20 @menu | |
21 * Parsing pass:: The language front end turns text into bits. | |
22 * Gimplification pass:: The bits are turned into something we can optimize. | |
23 * Pass manager:: Sequencing the optimization passes. | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
24 * Tree SSA passes:: Optimizations on a high-level representation. |
0 | 25 * RTL passes:: Optimizations on a low-level representation. |
26 @end menu | |
27 | |
28 @node Parsing pass | |
29 @section Parsing pass | |
30 @cindex GENERIC | |
31 @findex lang_hooks.parse_file | |
32 The language front end is invoked only once, via | |
33 @code{lang_hooks.parse_file}, to parse the entire input. The language | |
34 front end may use any intermediate language representation deemed | |
35 appropriate. The C front end uses GENERIC trees (CROSSREF), plus | |
36 a double handful of language specific tree codes defined in | |
37 @file{c-common.def}. The Fortran front end uses a completely different | |
38 private representation. | |
39 | |
40 @cindex GIMPLE | |
41 @cindex gimplification | |
42 @cindex gimplifier | |
43 @cindex language-independent intermediate representation | |
44 @cindex intermediate representation lowering | |
45 @cindex lowering, language-dependent intermediate representation | |
46 At some point the front end must translate the representation used in the | |
47 front end to a representation understood by the language-independent | |
48 portions of the compiler. Current practice takes one of two forms. | |
49 The C front end manually invokes the gimplifier (CROSSREF) on each function, | |
50 and uses the gimplifier callbacks to convert the language-specific tree | |
51 nodes directly to GIMPLE (CROSSREF) before passing the function off to | |
52 be compiled. | |
53 The Fortran front end converts from a private representation to GENERIC, | |
54 which is later lowered to GIMPLE when the function is compiled. Which | |
55 route to choose probably depends on how well GENERIC (plus extensions) | |
56 can be made to match up with the source language and necessary parsing | |
57 data structures. | |
58 | |
59 BUG: Gimplification must occur before nested function lowering, | |
60 and nested function lowering must be done by the front end before | |
61 passing the data off to cgraph. | |
62 | |
63 TODO: Cgraph should control nested function lowering. It would | |
64 only be invoked when it is certain that the outer-most function | |
65 is used. | |
66 | |
67 TODO: Cgraph needs a gimplify_function callback. It should be | |
68 invoked when (1) it is certain that the function is used, (2) | |
69 warning flags specified by the user require some amount of | |
70 compilation in order to honor, (3) the language indicates that | |
71 semantic analysis is not complete until gimplification occurs. | |
72 Hum@dots{} this sounds overly complicated. Perhaps we should just | |
73 have the front end gimplify always; in most cases it's only one | |
74 function call. | |
75 | |
76 The front end needs to pass all function definitions and top level | |
77 declarations off to the middle-end so that they can be compiled and | |
78 emitted to the object file. For a simple procedural language, it is | |
79 usually most convenient to do this as each top level declaration or | |
80 definition is seen. There is also a distinction to be made between | |
81 generating functional code and generating complete debug information. | |
82 The only thing that is absolutely required for functional code is that | |
83 function and data @emph{definitions} be passed to the middle-end. For | |
84 complete debug information, function, data and type declarations | |
85 should all be passed as well. | |
86 | |
87 @findex rest_of_decl_compilation | |
88 @findex rest_of_type_compilation | |
89 @findex cgraph_finalize_function | |
90 In any case, the front end needs each complete top-level function or | |
91 data declaration, and each data definition should be passed to | |
92 @code{rest_of_decl_compilation}. Each complete type definition should | |
93 be passed to @code{rest_of_type_compilation}. Each function definition | |
94 should be passed to @code{cgraph_finalize_function}. | |
95 | |
96 TODO: I know rest_of_compilation currently has all sorts of | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
97 RTL generation semantics. I plan to move all code generation |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
98 bits (both Tree and RTL) to compile_function. Should we hide |
0 | 99 cgraph from the front ends and move back to rest_of_compilation |
100 as the official interface? Possibly we should rename all three | |
101 interfaces such that the names match in some meaningful way and | |
102 that is more descriptive than "rest_of". | |
103 | |
104 The middle-end will, at its option, emit the function and data | |
105 definitions immediately or queue them for later processing. | |
106 | |
107 @node Gimplification pass | |
108 @section Gimplification pass | |
109 | |
110 @cindex gimplification | |
111 @cindex GIMPLE | |
112 @dfn{Gimplification} is a whimsical term for the process of converting | |
113 the intermediate representation of a function into the GIMPLE language | |
114 (CROSSREF). The term stuck, and so words like ``gimplification'', | |
115 ``gimplify'', ``gimplifier'' and the like are sprinkled throughout this | |
116 section of code. | |
117 | |
118 @cindex GENERIC | |
119 While a front end may certainly choose to generate GIMPLE directly if | |
120 it chooses, this can be a moderately complex process unless the | |
121 intermediate language used by the front end is already fairly simple. | |
122 Usually it is easier to generate GENERIC trees plus extensions | |
123 and let the language-independent gimplifier do most of the work. | |
124 | |
125 @findex gimplify_function_tree | |
126 @findex gimplify_expr | |
127 @findex lang_hooks.gimplify_expr | |
128 The main entry point to this pass is @code{gimplify_function_tree} | |
129 located in @file{gimplify.c}. From here we process the entire | |
130 function gimplifying each statement in turn. The main workhorse | |
131 for this pass is @code{gimplify_expr}. Approximately everything | |
132 passes through here at least once, and it is from here that we | |
133 invoke the @code{lang_hooks.gimplify_expr} callback. | |
134 | |
135 The callback should examine the expression in question and return | |
136 @code{GS_UNHANDLED} if the expression is not a language specific | |
137 construct that requires attention. Otherwise it should alter the | |
138 expression in some way to such that forward progress is made toward | |
139 producing valid GIMPLE@. If the callback is certain that the | |
140 transformation is complete and the expression is valid GIMPLE, it | |
141 should return @code{GS_ALL_DONE}. Otherwise it should return | |
142 @code{GS_OK}, which will cause the expression to be processed again. | |
143 If the callback encounters an error during the transformation (because | |
144 the front end is relying on the gimplification process to finish | |
145 semantic checks), it should return @code{GS_ERROR}. | |
146 | |
147 @node Pass manager | |
148 @section Pass manager | |
149 | |
150 The pass manager is located in @file{passes.c}, @file{tree-optimize.c} | |
151 and @file{tree-pass.h}. | |
152 Its job is to run all of the individual passes in the correct order, | |
153 and take care of standard bookkeeping that applies to every pass. | |
154 | |
155 The theory of operation is that each pass defines a structure that | |
156 represents everything we need to know about that pass---when it | |
157 should be run, how it should be run, what intermediate language | |
158 form or on-the-side data structures it needs. We register the pass | |
159 to be run in some particular order, and the pass manager arranges | |
160 for everything to happen in the correct order. | |
161 | |
162 The actuality doesn't completely live up to the theory at present. | |
163 Command-line switches and @code{timevar_id_t} enumerations must still | |
164 be defined elsewhere. The pass manager validates constraints but does | |
165 not attempt to (re-)generate data structures or lower intermediate | |
166 language form based on the requirements of the next pass. Nevertheless, | |
167 what is present is useful, and a far sight better than nothing at all. | |
168 | |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
169 Each pass should have a unique name. |
0 | 170 Each pass may have its own dump file (for GCC debugging purposes). |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
171 Passes with a name starting with a star do not dump anything. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
172 Sometimes passes are supposed to share a dump file / option name. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
173 To still give these unique names, you can use a prefix that is delimited |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
174 by a space from the part that is used for the dump file / option name. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
175 E.g. When the pass name is "ud dce", the name used for dump file/options |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
176 is "dce". |
0 | 177 |
178 TODO: describe the global variables set up by the pass manager, | |
179 and a brief description of how a new pass should use it. | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
180 I need to look at what info RTL passes use first@enddots{} |
0 | 181 |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
182 @node Tree SSA passes |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
183 @section Tree SSA passes |
0 | 184 |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
185 The following briefly describes the Tree optimization passes that are |
0 | 186 run after gimplification and what source files they are located in. |
187 | |
188 @itemize @bullet | |
189 @item Remove useless statements | |
190 | |
191 This pass is an extremely simple sweep across the gimple code in which | |
192 we identify obviously dead code and remove it. Here we do things like | |
193 simplify @code{if} statements with constant conditions, remove | |
194 exception handling constructs surrounding code that obviously cannot | |
195 throw, remove lexical bindings that contain no variables, and other | |
196 assorted simplistic cleanups. The idea is to get rid of the obvious | |
197 stuff quickly rather than wait until later when it's more work to get | |
198 rid of it. This pass is located in @file{tree-cfg.c} and described by | |
199 @code{pass_remove_useless_stmts}. | |
200 | |
201 @item Mudflap declaration registration | |
202 | |
203 If mudflap (@pxref{Optimize Options,,-fmudflap -fmudflapth | |
204 -fmudflapir,gcc,Using the GNU Compiler Collection (GCC)}) is | |
205 enabled, we generate code to register some variable declarations with | |
206 the mudflap runtime. Specifically, the runtime tracks the lifetimes of | |
207 those variable declarations that have their addresses taken, or whose | |
208 bounds are unknown at compile time (@code{extern}). This pass generates | |
209 new exception handling constructs (@code{try}/@code{finally}), and so | |
210 must run before those are lowered. In addition, the pass enqueues | |
211 declarations of static variables whose lifetimes extend to the entire | |
212 program. The pass is located in @file{tree-mudflap.c} and is described | |
213 by @code{pass_mudflap_1}. | |
214 | |
215 @item OpenMP lowering | |
216 | |
217 If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers | |
218 OpenMP constructs into GIMPLE. | |
219 | |
220 Lowering of OpenMP constructs involves creating replacement | |
221 expressions for local variables that have been mapped using data | |
222 sharing clauses, exposing the control flow of most synchronization | |
223 directives and adding region markers to facilitate the creation of the | |
224 control flow graph. The pass is located in @file{omp-low.c} and is | |
225 described by @code{pass_lower_omp}. | |
226 | |
227 @item OpenMP expansion | |
228 | |
229 If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands | |
230 parallel regions into their own functions to be invoked by the thread | |
231 library. The pass is located in @file{omp-low.c} and is described by | |
232 @code{pass_expand_omp}. | |
233 | |
234 @item Lower control flow | |
235 | |
236 This pass flattens @code{if} statements (@code{COND_EXPR}) | |
237 and moves lexical bindings (@code{BIND_EXPR}) out of line. After | |
238 this pass, all @code{if} statements will have exactly two @code{goto} | |
239 statements in its @code{then} and @code{else} arms. Lexical binding | |
240 information for each statement will be found in @code{TREE_BLOCK} rather | |
241 than being inferred from its position under a @code{BIND_EXPR}. This | |
242 pass is found in @file{gimple-low.c} and is described by | |
243 @code{pass_lower_cf}. | |
244 | |
245 @item Lower exception handling control flow | |
246 | |
247 This pass decomposes high-level exception handling constructs | |
248 (@code{TRY_FINALLY_EXPR} and @code{TRY_CATCH_EXPR}) into a form | |
249 that explicitly represents the control flow involved. After this | |
250 pass, @code{lookup_stmt_eh_region} will return a non-negative | |
251 number for any statement that may have EH control flow semantics; | |
252 examine @code{tree_can_throw_internal} or @code{tree_can_throw_external} | |
253 for exact semantics. Exact control flow may be extracted from | |
254 @code{foreach_reachable_handler}. The EH region nesting tree is defined | |
255 in @file{except.h} and built in @file{except.c}. The lowering pass | |
256 itself is in @file{tree-eh.c} and is described by @code{pass_lower_eh}. | |
257 | |
258 @item Build the control flow graph | |
259 | |
260 This pass decomposes a function into basic blocks and creates all of | |
261 the edges that connect them. It is located in @file{tree-cfg.c} and | |
262 is described by @code{pass_build_cfg}. | |
263 | |
264 @item Find all referenced variables | |
265 | |
266 This pass walks the entire function and collects an array of all | |
267 variables referenced in the function, @code{referenced_vars}. The | |
268 index at which a variable is found in the array is used as a UID | |
269 for the variable within this function. This data is needed by the | |
270 SSA rewriting routines. The pass is located in @file{tree-dfa.c} | |
271 and is described by @code{pass_referenced_vars}. | |
272 | |
273 @item Enter static single assignment form | |
274 | |
275 This pass rewrites the function such that it is in SSA form. After | |
276 this pass, all @code{is_gimple_reg} variables will be referenced by | |
277 @code{SSA_NAME}, and all occurrences of other variables will be | |
278 annotated with @code{VDEFS} and @code{VUSES}; PHI nodes will have | |
279 been inserted as necessary for each basic block. This pass is | |
280 located in @file{tree-ssa.c} and is described by @code{pass_build_ssa}. | |
281 | |
282 @item Warn for uninitialized variables | |
283 | |
284 This pass scans the function for uses of @code{SSA_NAME}s that | |
285 are fed by default definition. For non-parameter variables, such | |
286 uses are uninitialized. The pass is run twice, before and after | |
287 optimization (if turned on). In the first pass we only warn for uses that are | |
288 positively uninitialized; in the second pass we warn for uses that | |
289 are possibly uninitialized. The pass is located in @file{tree-ssa.c} | |
290 and is defined by @code{pass_early_warn_uninitialized} and | |
291 @code{pass_late_warn_uninitialized}. | |
292 | |
293 @item Dead code elimination | |
294 | |
295 This pass scans the function for statements without side effects whose | |
296 result is unused. It does not do memory life analysis, so any value | |
297 that is stored in memory is considered used. The pass is run multiple | |
298 times throughout the optimization process. It is located in | |
299 @file{tree-ssa-dce.c} and is described by @code{pass_dce}. | |
300 | |
301 @item Dominator optimizations | |
302 | |
303 This pass performs trivial dominator-based copy and constant propagation, | |
304 expression simplification, and jump threading. It is run multiple times | |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
305 throughout the optimization process. It is located in @file{tree-ssa-dom.c} |
0 | 306 and is described by @code{pass_dominator}. |
307 | |
308 @item Forward propagation of single-use variables | |
309 | |
310 This pass attempts to remove redundant computation by substituting | |
311 variables that are used once into the expression that uses them and | |
312 seeing if the result can be simplified. It is located in | |
313 @file{tree-ssa-forwprop.c} and is described by @code{pass_forwprop}. | |
314 | |
315 @item Copy Renaming | |
316 | |
317 This pass attempts to change the name of compiler temporaries involved in | |
318 copy operations such that SSA->normal can coalesce the copy away. When compiler | |
319 temporaries are copies of user variables, it also renames the compiler | |
320 temporary to the user variable resulting in better use of user symbols. It is | |
321 located in @file{tree-ssa-copyrename.c} and is described by | |
322 @code{pass_copyrename}. | |
323 | |
324 @item PHI node optimizations | |
325 | |
326 This pass recognizes forms of PHI inputs that can be represented as | |
327 conditional expressions and rewrites them into straight line code. | |
328 It is located in @file{tree-ssa-phiopt.c} and is described by | |
329 @code{pass_phiopt}. | |
330 | |
331 @item May-alias optimization | |
332 | |
333 This pass performs a flow sensitive SSA-based points-to analysis. | |
334 The resulting may-alias, must-alias, and escape analysis information | |
335 is used to promote variables from in-memory addressable objects to | |
336 non-aliased variables that can be renamed into SSA form. We also | |
337 update the @code{VDEF}/@code{VUSE} memory tags for non-renameable | |
338 aggregates so that we get fewer false kills. The pass is located | |
339 in @file{tree-ssa-alias.c} and is described by @code{pass_may_alias}. | |
340 | |
341 Interprocedural points-to information is located in | |
342 @file{tree-ssa-structalias.c} and described by @code{pass_ipa_pta}. | |
343 | |
344 @item Profiling | |
345 | |
346 This pass rewrites the function in order to collect runtime block | |
347 and value profiling data. Such data may be fed back into the compiler | |
348 on a subsequent run so as to allow optimization based on expected | |
349 execution frequencies. The pass is located in @file{predict.c} and | |
350 is described by @code{pass_profile}. | |
351 | |
352 @item Lower complex arithmetic | |
353 | |
354 This pass rewrites complex arithmetic operations into their component | |
355 scalar arithmetic operations. The pass is located in @file{tree-complex.c} | |
356 and is described by @code{pass_lower_complex}. | |
357 | |
358 @item Scalar replacement of aggregates | |
359 | |
360 This pass rewrites suitable non-aliased local aggregate variables into | |
361 a set of scalar variables. The resulting scalar variables are | |
362 rewritten into SSA form, which allows subsequent optimization passes | |
363 to do a significantly better job with them. The pass is located in | |
364 @file{tree-sra.c} and is described by @code{pass_sra}. | |
365 | |
366 @item Dead store elimination | |
367 | |
368 This pass eliminates stores to memory that are subsequently overwritten | |
369 by another store, without any intervening loads. The pass is located | |
370 in @file{tree-ssa-dse.c} and is described by @code{pass_dse}. | |
371 | |
372 @item Tail recursion elimination | |
373 | |
374 This pass transforms tail recursion into a loop. It is located in | |
375 @file{tree-tailcall.c} and is described by @code{pass_tail_recursion}. | |
376 | |
377 @item Forward store motion | |
378 | |
379 This pass sinks stores and assignments down the flowgraph closer to their | |
380 use point. The pass is located in @file{tree-ssa-sink.c} and is | |
381 described by @code{pass_sink_code}. | |
382 | |
383 @item Partial redundancy elimination | |
384 | |
385 This pass eliminates partially redundant computations, as well as | |
386 performing load motion. The pass is located in @file{tree-ssa-pre.c} | |
387 and is described by @code{pass_pre}. | |
388 | |
389 Just before partial redundancy elimination, if | |
390 @option{-funsafe-math-optimizations} is on, GCC tries to convert | |
391 divisions to multiplications by the reciprocal. The pass is located | |
392 in @file{tree-ssa-math-opts.c} and is described by | |
393 @code{pass_cse_reciprocal}. | |
394 | |
395 @item Full redundancy elimination | |
396 | |
397 This is a simpler form of PRE that only eliminates redundancies that | |
398 occur an all paths. It is located in @file{tree-ssa-pre.c} and | |
399 described by @code{pass_fre}. | |
400 | |
401 @item Loop optimization | |
402 | |
403 The main driver of the pass is placed in @file{tree-ssa-loop.c} | |
404 and described by @code{pass_loop}. | |
405 | |
406 The optimizations performed by this pass are: | |
407 | |
408 Loop invariant motion. This pass moves only invariants that | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
409 would be hard to handle on RTL level (function calls, operations that expand to |
0 | 410 nontrivial sequences of insns). With @option{-funswitch-loops} it also moves |
411 operands of conditions that are invariant out of the loop, so that we can use | |
412 just trivial invariantness analysis in loop unswitching. The pass also includes | |
413 store motion. The pass is implemented in @file{tree-ssa-loop-im.c}. | |
414 | |
415 Canonical induction variable creation. This pass creates a simple counter | |
416 for number of iterations of the loop and replaces the exit condition of the | |
417 loop using it, in case when a complicated analysis is necessary to determine | |
418 the number of iterations. Later optimizations then may determine the number | |
419 easily. The pass is implemented in @file{tree-ssa-loop-ivcanon.c}. | |
420 | |
421 Induction variable optimizations. This pass performs standard induction | |
422 variable optimizations, including strength reduction, induction variable | |
423 merging and induction variable elimination. The pass is implemented in | |
424 @file{tree-ssa-loop-ivopts.c}. | |
425 | |
426 Loop unswitching. This pass moves the conditional jumps that are invariant | |
427 out of the loops. To achieve this, a duplicate of the loop is created for | |
428 each possible outcome of conditional jump(s). The pass is implemented in | |
429 @file{tree-ssa-loop-unswitch.c}. This pass should eventually replace the | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
430 RTL level loop unswitching in @file{loop-unswitch.c}, but currently |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
431 the RTL level pass is not completely redundant yet due to deficiencies |
0 | 432 in tree level alias analysis. |
433 | |
434 The optimizations also use various utility functions contained in | |
435 @file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and | |
436 @file{cfgloopmanip.c}. | |
437 | |
438 Vectorization. This pass transforms loops to operate on vector types | |
439 instead of scalar types. Data parallelism across loop iterations is exploited | |
440 to group data elements from consecutive iterations into a vector and operate | |
441 on them in parallel. Depending on available target support the loop is | |
442 conceptually unrolled by a factor @code{VF} (vectorization factor), which is | |
443 the number of elements operated upon in parallel in each iteration, and the | |
444 @code{VF} copies of each scalar operation are fused to form a vector operation. | |
445 Additional loop transformations such as peeling and versioning may take place | |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
446 to align the number of iterations, and to align the memory accesses in the |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
447 loop. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
448 The pass is implemented in @file{tree-vectorizer.c} (the main driver), |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
449 @file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
450 and general loop utilities), @file{tree-vect-slp} (loop-aware SLP |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
451 functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. |
0 | 452 Analysis of data references is in @file{tree-data-ref.c}. |
453 | |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
454 SLP Vectorization. This pass performs vectorization of straight-line code. The |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
455 pass is implemented in @file{tree-vectorizer.c} (the main driver), |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
456 @file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
457 @file{tree-vect-data-refs.c}. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
458 |
0 | 459 Autoparallelization. This pass splits the loop iteration space to run |
460 into several threads. The pass is implemented in @file{tree-parloops.c}. | |
461 | |
55
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
462 Graphite is a loop transformation framework based on the polyhedral |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
463 model. Graphite stands for Gimple Represented as Polyhedra. The |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
464 internals of this infrastructure are documented in |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
465 @w{@uref{http://gcc.gnu.org/wiki/Graphite}}. The passes working on |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
466 this representation are implemented in the various @file{graphite-*} |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
467 files. |
77e2b8dfacca
update it from 4.4.3 to 4.5.0
ryoma <e075725@ie.u-ryukyu.ac.jp>
parents:
19
diff
changeset
|
468 |
0 | 469 @item Tree level if-conversion for vectorizer |
470 | |
471 This pass applies if-conversion to simple loops to help vectorizer. | |
472 We identify if convertible loops, if-convert statements and merge | |
473 basic blocks in one big block. The idea is to present loop in such | |
474 form so that vectorizer can have one to one mapping between statements | |
475 and available vector operations. This patch re-introduces COND_EXPR | |
476 at GIMPLE level. This pass is located in @file{tree-if-conv.c} and is | |
477 described by @code{pass_if_conversion}. | |
478 | |
479 @item Conditional constant propagation | |
480 | |
481 This pass relaxes a lattice of values in order to identify those | |
482 that must be constant even in the presence of conditional branches. | |
483 The pass is located in @file{tree-ssa-ccp.c} and is described | |
484 by @code{pass_ccp}. | |
485 | |
486 A related pass that works on memory loads and stores, and not just | |
487 register values, is located in @file{tree-ssa-ccp.c} and described by | |
488 @code{pass_store_ccp}. | |
489 | |
490 @item Conditional copy propagation | |
491 | |
492 This is similar to constant propagation but the lattice of values is | |
493 the ``copy-of'' relation. It eliminates redundant copies from the | |
494 code. The pass is located in @file{tree-ssa-copy.c} and described by | |
495 @code{pass_copy_prop}. | |
496 | |
497 A related pass that works on memory copies, and not just register | |
498 copies, is located in @file{tree-ssa-copy.c} and described by | |
499 @code{pass_store_copy_prop}. | |
500 | |
501 @item Value range propagation | |
502 | |
503 This transformation is similar to constant propagation but | |
504 instead of propagating single constant values, it propagates | |
505 known value ranges. The implementation is based on Patterson's | |
506 range propagation algorithm (Accurate Static Branch Prediction by | |
507 Value Range Propagation, J. R. C. Patterson, PLDI '95). In | |
508 contrast to Patterson's algorithm, this implementation does not | |
509 propagate branch probabilities nor it uses more than a single | |
510 range per SSA name. This means that the current implementation | |
511 cannot be used for branch prediction (though adapting it would | |
512 not be difficult). The pass is located in @file{tree-vrp.c} and is | |
513 described by @code{pass_vrp}. | |
514 | |
515 @item Folding built-in functions | |
516 | |
517 This pass simplifies built-in functions, as applicable, with constant | |
518 arguments or with inferable string lengths. It is located in | |
519 @file{tree-ssa-ccp.c} and is described by @code{pass_fold_builtins}. | |
520 | |
521 @item Split critical edges | |
522 | |
523 This pass identifies critical edges and inserts empty basic blocks | |
524 such that the edge is no longer critical. The pass is located in | |
525 @file{tree-cfg.c} and is described by @code{pass_split_crit_edges}. | |
526 | |
527 @item Control dependence dead code elimination | |
528 | |
529 This pass is a stronger form of dead code elimination that can | |
530 eliminate unnecessary control flow statements. It is located | |
531 in @file{tree-ssa-dce.c} and is described by @code{pass_cd_dce}. | |
532 | |
533 @item Tail call elimination | |
534 | |
535 This pass identifies function calls that may be rewritten into | |
536 jumps. No code transformation is actually applied here, but the | |
537 data and control flow problem is solved. The code transformation | |
538 requires target support, and so is delayed until RTL@. In the | |
539 meantime @code{CALL_EXPR_TAILCALL} is set indicating the possibility. | |
540 The pass is located in @file{tree-tailcall.c} and is described by | |
541 @code{pass_tail_calls}. The RTL transformation is handled by | |
542 @code{fixup_tail_calls} in @file{calls.c}. | |
543 | |
544 @item Warn for function return without value | |
545 | |
546 For non-void functions, this pass locates return statements that do | |
547 not specify a value and issues a warning. Such a statement may have | |
548 been injected by falling off the end of the function. This pass is | |
549 run last so that we have as much time as possible to prove that the | |
550 statement is not reachable. It is located in @file{tree-cfg.c} and | |
551 is described by @code{pass_warn_function_return}. | |
552 | |
553 @item Mudflap statement annotation | |
554 | |
555 If mudflap is enabled, we rewrite some memory accesses with code to | |
556 validate that the memory access is correct. In particular, expressions | |
557 involving pointer dereferences (@code{INDIRECT_REF}, @code{ARRAY_REF}, | |
558 etc.) are replaced by code that checks the selected address range | |
559 against the mudflap runtime's database of valid regions. This check | |
560 includes an inline lookup into a direct-mapped cache, based on | |
561 shift/mask operations of the pointer value, with a fallback function | |
562 call into the runtime. The pass is located in @file{tree-mudflap.c} and | |
563 is described by @code{pass_mudflap_2}. | |
564 | |
565 @item Leave static single assignment form | |
566 | |
567 This pass rewrites the function such that it is in normal form. At | |
568 the same time, we eliminate as many single-use temporaries as possible, | |
569 so the intermediate language is no longer GIMPLE, but GENERIC@. The | |
570 pass is located in @file{tree-outof-ssa.c} and is described by | |
571 @code{pass_del_ssa}. | |
572 | |
573 @item Merge PHI nodes that feed into one another | |
574 | |
575 This is part of the CFG cleanup passes. It attempts to join PHI nodes | |
576 from a forwarder CFG block into another block with PHI nodes. The | |
577 pass is located in @file{tree-cfgcleanup.c} and is described by | |
578 @code{pass_merge_phi}. | |
579 | |
580 @item Return value optimization | |
581 | |
582 If a function always returns the same local variable, and that local | |
583 variable is an aggregate type, then the variable is replaced with the | |
584 return value for the function (i.e., the function's DECL_RESULT). This | |
585 is equivalent to the C++ named return value optimization applied to | |
586 GIMPLE@. The pass is located in @file{tree-nrv.c} and is described by | |
587 @code{pass_nrv}. | |
588 | |
589 @item Return slot optimization | |
590 | |
591 If a function returns a memory object and is called as @code{var = | |
592 foo()}, this pass tries to change the call so that the address of | |
593 @code{var} is sent to the caller to avoid an extra memory copy. This | |
594 pass is located in @code{tree-nrv.c} and is described by | |
595 @code{pass_return_slot}. | |
596 | |
597 @item Optimize calls to @code{__builtin_object_size} | |
598 | |
599 This is a propagation pass similar to CCP that tries to remove calls | |
600 to @code{__builtin_object_size} when the size of the object can be | |
601 computed at compile-time. This pass is located in | |
602 @file{tree-object-size.c} and is described by | |
603 @code{pass_object_sizes}. | |
604 | |
605 @item Loop invariant motion | |
606 | |
607 This pass removes expensive loop-invariant computations out of loops. | |
608 The pass is located in @file{tree-ssa-loop.c} and described by | |
609 @code{pass_lim}. | |
610 | |
611 @item Loop nest optimizations | |
612 | |
613 This is a family of loop transformations that works on loop nests. It | |
614 includes loop interchange, scaling, skewing and reversal and they are | |
615 all geared to the optimization of data locality in array traversals | |
616 and the removal of dependencies that hamper optimizations such as loop | |
617 parallelization and vectorization. The pass is located in | |
618 @file{tree-loop-linear.c} and described by | |
619 @code{pass_linear_transform}. | |
620 | |
621 @item Removal of empty loops | |
622 | |
623 This pass removes loops with no code in them. The pass is located in | |
624 @file{tree-ssa-loop-ivcanon.c} and described by | |
625 @code{pass_empty_loop}. | |
626 | |
627 @item Unrolling of small loops | |
628 | |
629 This pass completely unrolls loops with few iterations. The pass | |
630 is located in @file{tree-ssa-loop-ivcanon.c} and described by | |
631 @code{pass_complete_unroll}. | |
632 | |
633 @item Predictive commoning | |
634 | |
635 This pass makes the code reuse the computations from the previous | |
636 iterations of the loops, especially loads and stores to memory. | |
637 It does so by storing the values of these computations to a bank | |
638 of temporary variables that are rotated at the end of loop. To avoid | |
639 the need for this rotation, the loop is then unrolled and the copies | |
640 of the loop body are rewritten to use the appropriate version of | |
641 the temporary variable. This pass is located in @file{tree-predcom.c} | |
642 and described by @code{pass_predcom}. | |
643 | |
644 @item Array prefetching | |
645 | |
646 This pass issues prefetch instructions for array references inside | |
647 loops. The pass is located in @file{tree-ssa-loop-prefetch.c} and | |
648 described by @code{pass_loop_prefetch}. | |
649 | |
650 @item Reassociation | |
651 | |
652 This pass rewrites arithmetic expressions to enable optimizations that | |
653 operate on them, like redundancy elimination and vectorization. The | |
654 pass is located in @file{tree-ssa-reassoc.c} and described by | |
655 @code{pass_reassoc}. | |
656 | |
657 @item Optimization of @code{stdarg} functions | |
658 | |
659 This pass tries to avoid the saving of register arguments into the | |
660 stack on entry to @code{stdarg} functions. If the function doesn't | |
661 use any @code{va_start} macros, no registers need to be saved. If | |
662 @code{va_start} macros are used, the @code{va_list} variables don't | |
663 escape the function, it is only necessary to save registers that will | |
664 be used in @code{va_arg} macros. For instance, if @code{va_arg} is | |
665 only used with integral types in the function, floating point | |
666 registers don't need to be saved. This pass is located in | |
667 @code{tree-stdarg.c} and described by @code{pass_stdarg}. | |
668 | |
669 @end itemize | |
670 | |
671 @node RTL passes | |
672 @section RTL passes | |
673 | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
674 The following briefly describes the RTL generation and optimization |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
675 passes that are run after the Tree optimization passes. |
0 | 676 |
677 @itemize @bullet | |
678 @item RTL generation | |
679 | |
680 @c Avoiding overfull is tricky here. | |
681 The source files for RTL generation include | |
682 @file{stmt.c}, | |
683 @file{calls.c}, | |
684 @file{expr.c}, | |
685 @file{explow.c}, | |
686 @file{expmed.c}, | |
687 @file{function.c}, | |
688 @file{optabs.c} | |
689 and @file{emit-rtl.c}. | |
690 Also, the file | |
691 @file{insn-emit.c}, generated from the machine description by the | |
692 program @code{genemit}, is used in this pass. The header file | |
693 @file{expr.h} is used for communication within this pass. | |
694 | |
695 @findex genflags | |
696 @findex gencodes | |
697 The header files @file{insn-flags.h} and @file{insn-codes.h}, | |
698 generated from the machine description by the programs @code{genflags} | |
699 and @code{gencodes}, tell this pass which standard names are available | |
700 for use and which patterns correspond to them. | |
701 | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
702 @item Generation of exception landing pads |
0 | 703 |
704 This pass generates the glue that handles communication between the | |
705 exception handling library routines and the exception handlers within | |
706 the function. Entry points in the function that are invoked by the | |
707 exception handling library are called @dfn{landing pads}. The code | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
708 for this pass is located in @file{except.c}. |
0 | 709 |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
710 @item Control flow graph cleanup |
0 | 711 |
712 This pass removes unreachable code, simplifies jumps to next, jumps to | |
713 jump, jumps across jumps, etc. The pass is run multiple times. | |
714 For historical reasons, it is occasionally referred to as the ``jump | |
715 optimization pass''. The bulk of the code for this pass is in | |
716 @file{cfgcleanup.c}, and there are support routines in @file{cfgrtl.c} | |
717 and @file{jump.c}. | |
718 | |
719 @item Forward propagation of single-def values | |
720 | |
721 This pass attempts to remove redundant computation by substituting | |
722 variables that come from a single definition, and | |
723 seeing if the result can be simplified. It performs copy propagation | |
724 and addressing mode selection. The pass is run twice, with values | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
725 being propagated into loops only on the second run. The code is |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
726 located in @file{fwprop.c}. |
0 | 727 |
728 @item Common subexpression elimination | |
729 | |
730 This pass removes redundant computation within basic blocks, and | |
731 optimizes addressing modes based on cost. The pass is run twice. | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
732 The code for this pass is located in @file{cse.c}. |
0 | 733 |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
734 @item Global common subexpression elimination |
0 | 735 |
736 This pass performs two | |
737 different types of GCSE depending on whether you are optimizing for | |
738 size or not (LCM based GCSE tends to increase code size for a gain in | |
739 speed, while Morel-Renvoise based GCSE does not). | |
740 When optimizing for size, GCSE is done using Morel-Renvoise Partial | |
741 Redundancy Elimination, with the exception that it does not try to move | |
742 invariants out of loops---that is left to the loop optimization pass. | |
743 If MR PRE GCSE is done, code hoisting (aka unification) is also done, as | |
744 well as load motion. | |
745 If you are optimizing for speed, LCM (lazy code motion) based GCSE is | |
746 done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM | |
747 based GCSE also does loop invariant code motion. We also perform load | |
748 and store motion when optimizing for speed. | |
749 Regardless of which type of GCSE is used, the GCSE pass also performs | |
750 global constant and copy propagation. | |
751 The source file for this pass is @file{gcse.c}, and the LCM routines | |
752 are in @file{lcm.c}. | |
753 | |
754 @item Loop optimization | |
755 | |
756 This pass performs several loop related optimizations. | |
757 The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain | |
758 generic loop analysis and manipulation code. Initialization and finalization | |
759 of loop structures is handled by @file{loop-init.c}. | |
760 A loop invariant motion pass is implemented in @file{loop-invariant.c}. | |
761 Basic block level optimizations---unrolling, peeling and unswitching loops--- | |
762 are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}. | |
763 Replacing of the exit condition of loops by special machine-dependent | |
764 instructions is handled by @file{loop-doloop.c}. | |
765 | |
766 @item Jump bypassing | |
767 | |
768 This pass is an aggressive form of GCSE that transforms the control | |
769 flow graph of a function by propagating constants into conditional | |
770 branch instructions. The source file for this pass is @file{gcse.c}. | |
771 | |
772 @item If conversion | |
773 | |
774 This pass attempts to replace conditional branches and surrounding | |
775 assignments with arithmetic, boolean value producing comparison | |
776 instructions, and conditional move instructions. In the very last | |
777 invocation after reload, it will generate predicated instructions | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
778 when supported by the target. The code is located in @file{ifcvt.c}. |
0 | 779 |
780 @item Web construction | |
781 | |
782 This pass splits independent uses of each pseudo-register. This can | |
783 improve effect of the other transformation, such as CSE or register | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
784 allocation. The code for this pass is located in @file{web.c}. |
0 | 785 |
786 @item Instruction combination | |
787 | |
788 This pass attempts to combine groups of two or three instructions that | |
789 are related by data flow into single instructions. It combines the | |
790 RTL expressions for the instructions by substitution, simplifies the | |
791 result using algebra, and then attempts to match the result against | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
792 the machine description. The code is located in @file{combine.c}. |
0 | 793 |
794 @item Register movement | |
795 | |
796 This pass looks for cases where matching constraints would force an | |
797 instruction to need a reload, and this reload would be a | |
798 register-to-register move. It then attempts to change the registers | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
799 used by the instruction to avoid the move instruction. The code is |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
800 located in @file{regmove.c}. |
0 | 801 |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
802 @item Mode switching optimization |
0 | 803 |
804 This pass looks for instructions that require the processor to be in a | |
805 specific ``mode'' and minimizes the number of mode changes required to | |
806 satisfy all users. What these modes are, and what they apply to are | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
807 completely target-specific. The code for this pass is located in |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
808 @file{mode-switching.c}. |
0 | 809 |
810 @cindex modulo scheduling | |
811 @cindex sms, swing, software pipelining | |
812 @item Modulo scheduling | |
813 | |
814 This pass looks at innermost loops and reorders their instructions | |
815 by overlapping different iterations. Modulo scheduling is performed | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
816 immediately before instruction scheduling. The code for this pass is |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
817 located in @file{modulo-sched.c}. |
0 | 818 |
819 @item Instruction scheduling | |
820 | |
821 This pass looks for instructions whose output will not be available by | |
822 the time that it is used in subsequent instructions. Memory loads and | |
823 floating point instructions often have this behavior on RISC machines. | |
824 It re-orders instructions within a basic block to try to separate the | |
825 definition and use of items that otherwise would cause pipeline | |
826 stalls. This pass is performed twice, before and after register | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
827 allocation. The code for this pass is located in @file{haifa-sched.c}, |
0 | 828 @file{sched-deps.c}, @file{sched-ebb.c}, @file{sched-rgn.c} and |
829 @file{sched-vis.c}. | |
830 | |
831 @item Register allocation | |
832 | |
833 These passes make sure that all occurrences of pseudo registers are | |
834 eliminated, either by allocating them to a hard register, replacing | |
835 them by an equivalent expression (e.g.@: a constant) or by placing | |
836 them on the stack. This is done in several subpasses: | |
837 | |
838 @itemize @bullet | |
839 @item | |
840 Register move optimizations. This pass makes some simple RTL code | |
841 transformations which improve the subsequent register allocation. The | |
842 source file is @file{regmove.c}. | |
843 | |
844 @item | |
845 The integrated register allocator (@acronym{IRA}). It is called | |
846 integrated because coalescing, register live range splitting, and hard | |
847 register preferencing are done on-the-fly during coloring. It also | |
848 has better integration with the reload pass. Pseudo-registers spilled | |
849 by the allocator or the reload have still a chance to get | |
850 hard-registers if the reload evicts some pseudo-registers from | |
851 hard-registers. The allocator helps to choose better pseudos for | |
852 spilling based on their live ranges and to coalesce stack slots | |
853 allocated for the spilled pseudo-registers. IRA is a regional | |
854 register allocator which is transformed into Chaitin-Briggs allocator | |
855 if there is one region. By default, IRA chooses regions using | |
856 register pressure but the user can force it to use one region or | |
857 regions corresponding to all loops. | |
858 | |
859 Source files of the allocator are @file{ira.c}, @file{ira-build.c}, | |
860 @file{ira-costs.c}, @file{ira-conflicts.c}, @file{ira-color.c}, | |
861 @file{ira-emit.c}, @file{ira-lives}, plus header files @file{ira.h} | |
862 and @file{ira-int.h} used for the communication between the allocator | |
863 and the rest of the compiler and between the IRA files. | |
864 | |
865 @cindex reloading | |
866 @item | |
867 Reloading. This pass renumbers pseudo registers with the hardware | |
868 registers numbers they were allocated. Pseudo registers that did not | |
869 get hard registers are replaced with stack slots. Then it finds | |
870 instructions that are invalid because a value has failed to end up in | |
871 a register, or has ended up in a register of the wrong kind. It fixes | |
872 up these instructions by reloading the problematical values | |
873 temporarily into registers. Additional instructions are generated to | |
874 do the copying. | |
875 | |
876 The reload pass also optionally eliminates the frame pointer and inserts | |
877 instructions to save and restore call-clobbered registers around calls. | |
878 | |
879 Source files are @file{reload.c} and @file{reload1.c}, plus the header | |
880 @file{reload.h} used for communication between them. | |
881 @end itemize | |
882 | |
883 @item Basic block reordering | |
884 | |
885 This pass implements profile guided code positioning. If profile | |
886 information is not available, various types of static analysis are | |
887 performed to make the predictions normally coming from the profile | |
888 feedback (IE execution frequency, branch probability, etc). It is | |
889 implemented in the file @file{bb-reorder.c}, and the various | |
890 prediction routines are in @file{predict.c}. | |
891 | |
892 @item Variable tracking | |
893 | |
894 This pass computes where the variables are stored at each | |
895 position in code and generates notes describing the variable locations | |
896 to RTL code. The location lists are then generated according to these | |
897 notes to debug information if the debugging information format supports | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
898 location lists. The code is located in @file{var-tracking.c}. |
0 | 899 |
900 @item Delayed branch scheduling | |
901 | |
902 This optional pass attempts to find instructions that can go into the | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
903 delay slots of other instructions, usually jumps and calls. The code |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
904 for this pass is located in @file{reorg.c}. |
0 | 905 |
906 @item Branch shortening | |
907 | |
908 On many RISC machines, branch instructions have a limited range. | |
909 Thus, longer sequences of instructions must be used for long branches. | |
910 In this pass, the compiler figures out what how far each instruction | |
911 will be from each other instruction, and therefore whether the usual | |
912 instructions, or the longer sequences, must be used for each branch. | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
913 The code for this pass is located in @file{final.c}. |
0 | 914 |
915 @item Register-to-stack conversion | |
916 | |
917 Conversion from usage of some hard registers to usage of a register | |
918 stack may be done at this point. Currently, this is supported only | |
19
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
919 for the floating-point registers of the Intel 80387 coprocessor. The |
58ad6c70ea60
update gcc from 4.4.0 to 4.4.1.
kent@firefly.cr.ie.u-ryukyu.ac.jp
parents:
0
diff
changeset
|
920 code for this pass is located in @file{reg-stack.c}. |
0 | 921 |
922 @item Final | |
923 | |
924 This pass outputs the assembler code for the function. The source files | |
925 are @file{final.c} plus @file{insn-output.c}; the latter is generated | |
926 automatically from the machine description by the tool @file{genoutput}. | |
927 The header file @file{conditions.h} is used for communication between | |
928 these files. If mudflap is enabled, the queue of deferred declarations | |
929 and any addressed constants (e.g., string literals) is processed by | |
930 @code{mudflap_finish_file} into a synthetic constructor function | |
931 containing calls into the mudflap runtime. | |
932 | |
933 @item Debugging information output | |
934 | |
935 This is run after final because it must output the stack slot offsets | |
936 for pseudo registers that did not get hard registers. Source files | |
937 are @file{dbxout.c} for DBX symbol table format, @file{sdbout.c} for | |
938 SDB symbol table format, @file{dwarfout.c} for DWARF symbol table | |
939 format, files @file{dwarf2out.c} and @file{dwarf2asm.c} for DWARF2 | |
940 symbol table format, and @file{vmsdbgout.c} for VMS debug symbol table | |
941 format. | |
942 | |
943 @end itemize |