Mercurial > hg > CbC > CbC_llvm
comparison tools/clang/docs/PCHInternals.rst @ 3:9ad51c7bc036
1st commit. remove git dir and add all files.
author | Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> |
---|---|
date | Wed, 15 May 2013 06:43:32 +0900 |
parents | |
children | afa8332a0e37 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 3:9ad51c7bc036 |
---|---|
1 ======================================== | |
2 Precompiled Header and Modules Internals | |
3 ======================================== | |
4 | |
5 .. contents:: | |
6 :local: | |
7 | |
8 This document describes the design and implementation of Clang's precompiled | |
9 headers (PCH) and modules. If you are interested in the end-user view, please | |
10 see the :ref:`User's Manual <usersmanual-precompiled-headers>`. | |
11 | |
12 Using Precompiled Headers with ``clang`` | |
13 ---------------------------------------- | |
14 | |
15 The Clang compiler frontend, ``clang -cc1``, supports two command line options | |
16 for generating and using PCH files. | |
17 | |
18 To generate PCH files using ``clang -cc1``, use the option :option:`-emit-pch`: | |
19 | |
20 .. code-block:: bash | |
21 | |
22 $ clang -cc1 test.h -emit-pch -o test.h.pch | |
23 | |
24 This option is transparently used by ``clang`` when generating PCH files. The | |
25 resulting PCH file contains the serialized form of the compiler's internal | |
26 representation after it has completed parsing and semantic analysis. The PCH | |
27 file can then be used as a prefix header with the :option:`-include-pch` | |
28 option: | |
29 | |
30 .. code-block:: bash | |
31 | |
32 $ clang -cc1 -include-pch test.h.pch test.c -o test.s | |
33 | |
34 Design Philosophy | |
35 ----------------- | |
36 | |
37 Precompiled headers are meant to improve overall compile times for projects, so | |
38 the design of precompiled headers is entirely driven by performance concerns. | |
39 The use case for precompiled headers is relatively simple: when there is a | |
40 common set of headers that is included in nearly every source file in the | |
41 project, we *precompile* that bundle of headers into a single precompiled | |
42 header (PCH file). Then, when compiling the source files in the project, we | |
43 load the PCH file first (as a prefix header), which acts as a stand-in for that | |
44 bundle of headers. | |
45 | |
46 A precompiled header implementation improves performance when: | |
47 | |
48 * Loading the PCH file is significantly faster than re-parsing the bundle of | |
49 headers stored within the PCH file. Thus, a precompiled header design | |
50 attempts to minimize the cost of reading the PCH file. Ideally, this cost | |
51 should not vary with the size of the precompiled header file. | |
52 | |
53 * The cost of generating the PCH file initially is not so large that it | |
54 counters the per-source-file performance improvement due to eliminating the | |
55 need to parse the bundled headers in the first place. This is particularly | |
56 important on multi-core systems, because PCH file generation serializes the | |
57 build when all compilations require the PCH file to be up-to-date. | |
58 | |
59 Modules, as implemented in Clang, use the same mechanisms as precompiled | |
60 headers to save a serialized AST file (one per module) and use those AST | |
61 modules. From an implementation standpoint, modules are a generalization of | |
62 precompiled headers, lifting a number of restrictions placed on precompiled | |
63 headers. In particular, there can only be one precompiled header and it must | |
64 be included at the beginning of the translation unit. The extensions to the | |
65 AST file format required for modules are discussed in the section on | |
66 :ref:`modules <pchinternals-modules>`. | |
67 | |
68 Clang's AST files are designed with a compact on-disk representation, which | |
69 minimizes both creation time and the time required to initially load the AST | |
70 file. The AST file itself contains a serialized representation of Clang's | |
71 abstract syntax trees and supporting data structures, stored using the same | |
72 compressed bitstream as `LLVM's bitcode file format | |
73 <http://llvm.org/docs/BitCodeFormat.html>`_. | |
74 | |
75 Clang's AST files are loaded "lazily" from disk. When an AST file is initially | |
76 loaded, Clang reads only a small amount of data from the AST file to establish | |
77 where certain important data structures are stored. The amount of data read in | |
78 this initial load is independent of the size of the AST file, such that a | |
79 larger AST file does not lead to longer AST load times. The actual header data | |
80 in the AST file --- macros, functions, variables, types, etc. --- is loaded | |
81 only when it is referenced from the user's code, at which point only that | |
82 entity (and those entities it depends on) are deserialized from the AST file. | |
83 With this approach, the cost of using an AST file for a translation unit is | |
84 proportional to the amount of code actually used from the AST file, rather than | |
85 being proportional to the size of the AST file itself. | |
86 | |
87 When given the :option:`-print-stats` option, Clang produces statistics | |
88 describing how much of the AST file was actually loaded from disk. For a | |
89 simple "Hello, World!" program that includes the Apple ``Cocoa.h`` header | |
90 (which is built as a precompiled header), this option illustrates how little of | |
91 the actual precompiled header is required: | |
92 | |
93 .. code-block:: none | |
94 | |
95 *** AST File Statistics: | |
96 895/39981 source location entries read (2.238563%) | |
97 19/15315 types read (0.124061%) | |
98 20/82685 declarations read (0.024188%) | |
99 154/58070 identifiers read (0.265197%) | |
100 0/7260 selectors read (0.000000%) | |
101 0/30842 statements read (0.000000%) | |
102 4/8400 macros read (0.047619%) | |
103 1/4995 lexical declcontexts read (0.020020%) | |
104 0/4413 visible declcontexts read (0.000000%) | |
105 0/7230 method pool entries read (0.000000%) | |
106 0 method pool misses | |
107 | |
108 For this small program, only a tiny fraction of the source locations, types, | |
109 declarations, identifiers, and macros were actually deserialized from the | |
110 precompiled header. These statistics can be useful to determine whether the | |
111 AST file implementation can be improved by making more of the implementation | |
112 lazy. | |
113 | |
114 Precompiled headers can be chained. When you create a PCH while including an | |
115 existing PCH, Clang can create the new PCH by referencing the original file and | |
116 only writing the new data to the new file. For example, you could create a PCH | |
117 out of all the headers that are very commonly used throughout your project, and | |
118 then create a PCH for every single source file in the project that includes the | |
119 code that is specific to that file, so that recompiling the file itself is very | |
120 fast, without duplicating the data from the common headers for every file. The | |
121 mechanisms behind chained precompiled headers are discussed in a :ref:`later | |
122 section <pchinternals-chained>`. | |
123 | |
124 AST File Contents | |
125 ----------------- | |
126 | |
127 Clang's AST files are organized into several different blocks, each of which | |
128 contains the serialized representation of a part of Clang's internal | |
129 representation. Each of the blocks corresponds to either a block or a record | |
130 within `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_. | |
131 The contents of each of these logical blocks are described below. | |
132 | |
133 .. image:: PCHLayout.png | |
134 | |
135 For a given AST file, the `llvm-bcanalyzer | |
136 <http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_ utility can be used | |
137 to examine the actual structure of the bitstream for the AST file. This | |
138 information can be used both to help understand the structure of the AST file | |
139 and to isolate areas where AST files can still be optimized, e.g., through the | |
140 introduction of abbreviations. | |
141 | |
142 Metadata Block | |
143 ^^^^^^^^^^^^^^ | |
144 | |
145 The metadata block contains several records that provide information about how | |
146 the AST file was built. This metadata is primarily used to validate the use of | |
147 an AST file. For example, a precompiled header built for a 32-bit x86 target | |
148 cannot be used when compiling for a 64-bit x86 target. The metadata block | |
149 contains information about: | |
150 | |
151 Language options | |
152 Describes the particular language dialect used to compile the AST file, | |
153 including major options (e.g., Objective-C support) and more minor options | |
154 (e.g., support for "``//``" comments). The contents of this record correspond to | |
155 the ``LangOptions`` class. | |
156 | |
157 Target architecture | |
158 The target triple that describes the architecture, platform, and ABI for | |
159 which the AST file was generated, e.g., ``i386-apple-darwin9``. | |
160 | |
161 AST version | |
162 The major and minor version numbers of the AST file format. Changes in the | |
163 minor version number should not affect backward compatibility, while changes | |
164 in the major version number imply that a newer compiler cannot read an older | |
165 precompiled header (and vice-versa). | |
166 | |
167 Original file name | |
168 The full path of the header that was used to generate the AST file. | |
169 | |
170 Predefines buffer | |
171 Although not explicitly stored as part of the metadata, the predefines buffer | |
172 is used in the validation of the AST file. The predefines buffer itself | |
173 contains code generated by the compiler to initialize the preprocessor state | |
174 according to the current target, platform, and command-line options. For | |
175 example, the predefines buffer will contain "``#define __STDC__ 1``" when we | |
176 are compiling C without Microsoft extensions. The predefines buffer itself | |
177 is stored within the :ref:`pchinternals-sourcemgr`, but its contents are | |
178 verified along with the rest of the metadata. | |
179 | |
180 A chained PCH file (that is, one that references another PCH) and a module | |
181 (which may import other modules) have additional metadata containing the list | |
182 of all AST files that this AST file depends on. Each of those files will be | |
183 loaded along with this AST file. | |
184 | |
185 For chained precompiled headers, the language options, target architecture and | |
186 predefines buffer data is taken from the end of the chain, since they have to | |
187 match anyway. | |
188 | |
189 .. _pchinternals-sourcemgr: | |
190 | |
191 Source Manager Block | |
192 ^^^^^^^^^^^^^^^^^^^^ | |
193 | |
194 The source manager block contains the serialized representation of Clang's | |
195 :ref:`SourceManager <SourceManager>` class, which handles the mapping from | |
196 source locations (as represented in Clang's abstract syntax tree) into actual | |
197 column/line positions within a source file or macro instantiation. The AST | |
198 file's representation of the source manager also includes information about all | |
199 of the headers that were (transitively) included when building the AST file. | |
200 | |
201 The bulk of the source manager block is dedicated to information about the | |
202 various files, buffers, and macro instantiations into which a source location | |
203 can refer. Each of these is referenced by a numeric "file ID", which is a | |
204 unique number (allocated starting at 1) stored in the source location. Clang | |
205 serializes the information for each kind of file ID, along with an index that | |
206 maps file IDs to the position within the AST file where the information about | |
207 that file ID is stored. The data associated with a file ID is loaded only when | |
208 required by the front end, e.g., to emit a diagnostic that includes a macro | |
209 instantiation history inside the header itself. | |
210 | |
211 The source manager block also contains information about all of the headers | |
212 that were included when building the AST file. This includes information about | |
213 the controlling macro for the header (e.g., when the preprocessor identified | |
214 that the contents of the header dependent on a macro like | |
215 ``LLVM_CLANG_SOURCEMANAGER_H``). | |
216 | |
217 .. _pchinternals-preprocessor: | |
218 | |
219 Preprocessor Block | |
220 ^^^^^^^^^^^^^^^^^^ | |
221 | |
222 The preprocessor block contains the serialized representation of the | |
223 preprocessor. Specifically, it contains all of the macros that have been | |
224 defined by the end of the header used to build the AST file, along with the | |
225 token sequences that comprise each macro. The macro definitions are only read | |
226 from the AST file when the name of the macro first occurs in the program. This | |
227 lazy loading of macro definitions is triggered by lookups into the | |
228 :ref:`identifier table <pchinternals-ident-table>`. | |
229 | |
230 .. _pchinternals-types: | |
231 | |
232 Types Block | |
233 ^^^^^^^^^^^ | |
234 | |
235 The types block contains the serialized representation of all of the types | |
236 referenced in the translation unit. Each Clang type node (``PointerType``, | |
237 ``FunctionProtoType``, etc.) has a corresponding record type in the AST file. | |
238 When types are deserialized from the AST file, the data within the record is | |
239 used to reconstruct the appropriate type node using the AST context. | |
240 | |
241 Each type has a unique type ID, which is an integer that uniquely identifies | |
242 that type. Type ID 0 represents the NULL type, type IDs less than | |
243 ``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.), | |
244 while other "user-defined" type IDs are assigned consecutively from | |
245 ``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered. The AST file has | |
246 an associated mapping from the user-defined types block to the location within | |
247 the types block where the serialized representation of that type resides, | |
248 enabling lazy deserialization of types. When a type is referenced from within | |
249 the AST file, that reference is encoded using the type ID shifted left by 3 | |
250 bits. The lower three bits are used to represent the ``const``, ``volatile``, | |
251 and ``restrict`` qualifiers, as in Clang's :ref:`QualType <QualType>` class. | |
252 | |
253 .. _pchinternals-decls: | |
254 | |
255 Declarations Block | |
256 ^^^^^^^^^^^^^^^^^^ | |
257 | |
258 The declarations block contains the serialized representation of all of the | |
259 declarations referenced in the translation unit. Each Clang declaration node | |
260 (``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the | |
261 AST file. When declarations are deserialized from the AST file, the data | |
262 within the record is used to build and populate a new instance of the | |
263 corresponding ``Decl`` node. As with types, each declaration node has a | |
264 numeric ID that is used to refer to that declaration within the AST file. In | |
265 addition, a lookup table provides a mapping from that numeric ID to the offset | |
266 within the precompiled header where that declaration is described. | |
267 | |
268 Declarations in Clang's abstract syntax trees are stored hierarchically. At | |
269 the top of the hierarchy is the translation unit (``TranslationUnitDecl``), | |
270 which contains all of the declarations in the translation unit but is not | |
271 actually written as a specific declaration node. Its child declarations (such | |
272 as functions or struct types) may also contain other declarations inside them, | |
273 and so on. Within Clang, each declaration is stored within a :ref:`declaration | |
274 context <DeclContext>`, as represented by the ``DeclContext`` class. | |
275 Declaration contexts provide the mechanism to perform name lookup within a | |
276 given declaration (e.g., find the member named ``x`` in a structure) and | |
277 iterate over the declarations stored within a context (e.g., iterate over all | |
278 of the fields of a structure for structure layout). | |
279 | |
280 In Clang's AST file format, deserializing a declaration that is a | |
281 ``DeclContext`` is a separate operation from deserializing all of the | |
282 declarations stored within that declaration context. Therefore, Clang will | |
283 deserialize the translation unit declaration without deserializing the | |
284 declarations within that translation unit. When required, the declarations | |
285 stored within a declaration context will be deserialized. There are two | |
286 representations of the declarations within a declaration context, which | |
287 correspond to the name-lookup and iteration behavior described above: | |
288 | |
289 * When the front end performs name lookup to find a name ``x`` within a given | |
290 declaration context (for example, during semantic analysis of the expression | |
291 ``p->x``, where ``p``'s type is defined in the precompiled header), Clang | |
292 refers to an on-disk hash table that maps from the names within that | |
293 declaration context to the declaration IDs that represent each visible | |
294 declaration with that name. The actual declarations will then be | |
295 deserialized to provide the results of name lookup. | |
296 * When the front end performs iteration over all of the declarations within a | |
297 declaration context, all of those declarations are immediately | |
298 de-serialized. For large declaration contexts (e.g., the translation unit), | |
299 this operation is expensive; however, large declaration contexts are not | |
300 traversed in normal compilation, since such a traversal is unnecessary. | |
301 However, it is common for the code generator and semantic analysis to | |
302 traverse declaration contexts for structs, classes, unions, and | |
303 enumerations, although those contexts contain relatively few declarations in | |
304 the common case. | |
305 | |
306 Statements and Expressions | |
307 ^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
308 | |
309 Statements and expressions are stored in the AST file in both the :ref:`types | |
310 <pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks, | |
311 because every statement or expression will be associated with either a type or | |
312 declaration. The actual statement and expression records are stored | |
313 immediately following the declaration or type that owns the statement or | |
314 expression. For example, the statement representing the body of a function | |
315 will be stored directly following the declaration of the function. | |
316 | |
317 As with types and declarations, each statement and expression kind in Clang's | |
318 abstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding | |
319 record type in the AST file, which contains the serialized representation of | |
320 that statement or expression. Each substatement or subexpression within an | |
321 expression is stored as a separate record (which keeps most records to a fixed | |
322 size). Within the AST file, the subexpressions of an expression are stored, in | |
323 reverse order, prior to the expression that owns those expression, using a form | |
324 of `Reverse Polish Notation | |
325 <http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_. For example, an | |
326 expression ``3 - 4 + 5`` would be represented as follows: | |
327 | |
328 +-----------------------+ | |
329 | ``IntegerLiteral(5)`` | | |
330 +-----------------------+ | |
331 | ``IntegerLiteral(4)`` | | |
332 +-----------------------+ | |
333 | ``IntegerLiteral(3)`` | | |
334 +-----------------------+ | |
335 | ``IntegerLiteral(-)`` | | |
336 +-----------------------+ | |
337 | ``IntegerLiteral(+)`` | | |
338 +-----------------------+ | |
339 | ``STOP`` | | |
340 +-----------------------+ | |
341 | |
342 When reading this representation, Clang evaluates each expression record it | |
343 encounters, builds the appropriate abstract syntax tree node, and then pushes | |
344 that expression on to a stack. When a record contains *N* subexpressions --- | |
345 ``BinaryOperator`` has two of them --- those expressions are popped from the | |
346 top of the stack. The special STOP code indicates that we have reached the end | |
347 of a serialized expression or statement; other expression or statement records | |
348 may follow, but they are part of a different expression. | |
349 | |
350 .. _pchinternals-ident-table: | |
351 | |
352 Identifier Table Block | |
353 ^^^^^^^^^^^^^^^^^^^^^^ | |
354 | |
355 The identifier table block contains an on-disk hash table that maps each | |
356 identifier mentioned within the AST file to the serialized representation of | |
357 the identifier's information (e.g, the ``IdentifierInfo`` structure). The | |
358 serialized representation contains: | |
359 | |
360 * The actual identifier string. | |
361 * Flags that describe whether this identifier is the name of a built-in, a | |
362 poisoned identifier, an extension token, or a macro. | |
363 * If the identifier names a macro, the offset of the macro definition within | |
364 the :ref:`pchinternals-preprocessor`. | |
365 * If the identifier names one or more declarations visible from translation | |
366 unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these | |
367 declarations. | |
368 | |
369 When an AST file is loaded, the AST file reader mechanism introduces itself | |
370 into the identifier table as an external lookup source. Thus, when the user | |
371 program refers to an identifier that has not yet been seen, Clang will perform | |
372 a lookup into the identifier table. If an identifier is found, its contents | |
373 (macro definitions, flags, top-level declarations, etc.) will be deserialized, | |
374 at which point the corresponding ``IdentifierInfo`` structure will have the | |
375 same contents it would have after parsing the headers in the AST file. | |
376 | |
377 Within the AST file, the identifiers used to name declarations are represented | |
378 with an integral value. A separate table provides a mapping from this integral | |
379 value (the identifier ID) to the location within the on-disk hash table where | |
380 that identifier is stored. This mapping is used when deserializing the name of | |
381 a declaration, the identifier of a token, or any other construct in the AST | |
382 file that refers to a name. | |
383 | |
384 .. _pchinternals-method-pool: | |
385 | |
386 Method Pool Block | |
387 ^^^^^^^^^^^^^^^^^ | |
388 | |
389 The method pool block is represented as an on-disk hash table that serves two | |
390 purposes: it provides a mapping from the names of Objective-C selectors to the | |
391 set of Objective-C instance and class methods that have that particular | |
392 selector (which is required for semantic analysis in Objective-C) and also | |
393 stores all of the selectors used by entities within the AST file. The design | |
394 of the method pool is similar to that of the :ref:`identifier table | |
395 <pchinternals-ident-table>`: the first time a particular selector is formed | |
396 during the compilation of the program, Clang will search in the on-disk hash | |
397 table of selectors; if found, Clang will read the Objective-C methods | |
398 associated with that selector into the appropriate front-end data structure | |
399 (``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and | |
400 class methods, respectively). | |
401 | |
402 As with identifiers, selectors are represented by numeric values within the AST | |
403 file. A separate index maps these numeric selector values to the offset of the | |
404 selector within the on-disk hash table, and will be used when de-serializing an | |
405 Objective-C method declaration (or other Objective-C construct) that refers to | |
406 the selector. | |
407 | |
408 AST Reader Integration Points | |
409 ----------------------------- | |
410 | |
411 The "lazy" deserialization behavior of AST files requires their integration | |
412 into several completely different submodules of Clang. For example, lazily | |
413 deserializing the declarations during name lookup requires that the name-lookup | |
414 routines be able to query the AST file to find entities stored there. | |
415 | |
416 For each Clang data structure that requires direct interaction with the AST | |
417 reader logic, there is an abstract class that provides the interface between | |
418 the two modules. The ``ASTReader`` class, which handles the loading of an AST | |
419 file, inherits from all of these abstract classes to provide lazy | |
420 deserialization of Clang's data structures. ``ASTReader`` implements the | |
421 following abstract classes: | |
422 | |
423 ``ExternalSLocEntrySource`` | |
424 This abstract interface is associated with the ``SourceManager`` class, and | |
425 is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to | |
426 load the details of a file, buffer, or macro instantiation. | |
427 | |
428 ``IdentifierInfoLookup`` | |
429 This abstract interface is associated with the ``IdentifierTable`` class, and | |
430 is used whenever the program source refers to an identifier that has not yet | |
431 been seen. In this case, the AST reader searches for this identifier within | |
432 its :ref:`identifier table <pchinternals-ident-table>` to load any top-level | |
433 declarations or macros associated with that identifier. | |
434 | |
435 ``ExternalASTSource`` | |
436 This abstract interface is associated with the ``ASTContext`` class, and is | |
437 used whenever the abstract syntax tree nodes need to loaded from the AST | |
438 file. It provides the ability to de-serialize declarations and types | |
439 identified by their numeric values, read the bodies of functions when | |
440 required, and read the declarations stored within a declaration context | |
441 (either for iteration or for name lookup). | |
442 | |
443 ``ExternalSemaSource`` | |
444 This abstract interface is associated with the ``Sema`` class, and is used | |
445 whenever semantic analysis needs to read information from the :ref:`global | |
446 method pool <pchinternals-method-pool>`. | |
447 | |
448 .. _pchinternals-chained: | |
449 | |
450 Chained precompiled headers | |
451 --------------------------- | |
452 | |
453 Chained precompiled headers were initially intended to improve the performance | |
454 of IDE-centric operations such as syntax highlighting and code completion while | |
455 a particular source file is being edited by the user. To minimize the amount | |
456 of reparsing required after a change to the file, a form of precompiled header | |
457 --- called a precompiled *preamble* --- is automatically generated by parsing | |
458 all of the headers in the source file, up to and including the last | |
459 ``#include``. When only the source file changes (and none of the headers it | |
460 depends on), reparsing of that source file can use the precompiled preamble and | |
461 start parsing after the ``#include``\ s, so parsing time is proportional to the | |
462 size of the source file (rather than all of its includes). However, the | |
463 compilation of that translation unit may already use a precompiled header: in | |
464 this case, Clang will create the precompiled preamble as a chained precompiled | |
465 header that refers to the original precompiled header. This drastically | |
466 reduces the time needed to serialize the precompiled preamble for use in | |
467 reparsing. | |
468 | |
469 Chained precompiled headers get their name because each precompiled header can | |
470 depend on one other precompiled header, forming a chain of dependencies. A | |
471 translation unit will then include the precompiled header that starts the chain | |
472 (i.e., nothing depends on it). This linearity of dependencies is important for | |
473 the semantic model of chained precompiled headers, because the most-recent | |
474 precompiled header can provide information that overrides the information | |
475 provided by the precompiled headers it depends on, just like a header file | |
476 ``B.h`` that includes another header ``A.h`` can modify the state produced by | |
477 parsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``. | |
478 | |
479 There are several ways in which chained precompiled headers generalize the AST | |
480 file model: | |
481 | |
482 Numbering of IDs | |
483 Many different kinds of entities --- identifiers, declarations, types, etc. | |
484 --- have ID numbers that start at 1 or some other predefined constant and | |
485 grow upward. Each precompiled header records the maximum ID number it has | |
486 assigned in each category. Then, when a new precompiled header is generated | |
487 that depends on (chains to) another precompiled header, it will start | |
488 counting at the next available ID number. This way, one can determine, given | |
489 an ID number, which AST file actually contains the entity. | |
490 | |
491 Name lookup | |
492 When writing a chained precompiled header, Clang attempts to write only | |
493 information that has changed from the precompiled header on which it is | |
494 based. This changes the lookup algorithm for the various tables, such as the | |
495 :ref:`identifier table <pchinternals-ident-table>`: the search starts at the | |
496 most-recent precompiled header. If no entry is found, lookup then proceeds | |
497 to the identifier table in the precompiled header it depends on, and so one. | |
498 Once a lookup succeeds, that result is considered definitive, overriding any | |
499 results from earlier precompiled headers. | |
500 | |
501 Update records | |
502 There are various ways in which a later precompiled header can modify the | |
503 entities described in an earlier precompiled header. For example, later | |
504 precompiled headers can add entries into the various name-lookup tables for | |
505 the translation unit or namespaces, or add new categories to an Objective-C | |
506 class. Each of these updates is captured in an "update record" that is | |
507 stored in the chained precompiled header file and will be loaded along with | |
508 the original entity. | |
509 | |
510 .. _pchinternals-modules: | |
511 | |
512 Modules | |
513 ------- | |
514 | |
515 Modules generalize the chained precompiled header model yet further, from a | |
516 linear chain of precompiled headers to an arbitrary directed acyclic graph | |
517 (DAG) of AST files. All of the same techniques used to make chained | |
518 precompiled headers work --- ID number, name lookup, update records --- are | |
519 shared with modules. However, the DAG nature of modules introduce a number of | |
520 additional complications to the model: | |
521 | |
522 Numbering of IDs | |
523 The simple, linear numbering scheme used in chained precompiled headers falls | |
524 apart with the module DAG, because different modules may end up with | |
525 different numbering schemes for entities they imported from common shared | |
526 modules. To account for this, each module file provides information about | |
527 which modules it depends on and which ID numbers it assigned to the entities | |
528 in those modules, as well as which ID numbers it took for its own new | |
529 entities. The AST reader then maps these "local" ID numbers into a "global" | |
530 ID number space for the current translation unit, providing a 1-1 mapping | |
531 between entities (in whatever AST file they inhabit) and global ID numbers. | |
532 If that translation unit is then serialized into an AST file, this mapping | |
533 will be stored for use when the AST file is imported. | |
534 | |
535 Declaration merging | |
536 It is possible for a given entity (from the language's perspective) to be | |
537 declared multiple times in different places. For example, two different | |
538 headers can have the declaration of ``printf`` or could forward-declare | |
539 ``struct stat``. If each of those headers is included in a module, and some | |
540 third party imports both of those modules, there is a potentially serious | |
541 problem: name lookup for ``printf`` or ``struct stat`` will find both | |
542 declarations, but the AST nodes are unrelated. This would result in a | |
543 compilation error, due to an ambiguity in name lookup. Therefore, the AST | |
544 reader performs declaration merging according to the appropriate language | |
545 semantics, ensuring that the two disjoint declarations are merged into a | |
546 single redeclaration chain (with a common canonical declaration), so that it | |
547 is as if one of the headers had been included before the other. | |
548 | |
549 Name Visibility | |
550 Modules allow certain names that occur during module creation to be "hidden", | |
551 so that they are not part of the public interface of the module and are not | |
552 visible to its clients. The AST reader maintains a "visible" bit on various | |
553 AST nodes (declarations, macros, etc.) to indicate whether that particular | |
554 AST node is currently visible; the various name lookup mechanisms in Clang | |
555 inspect the visible bit to determine whether that entity, which is still in | |
556 the AST (because other, visible AST nodes may depend on it), can actually be | |
557 found by name lookup. When a new (sub)module is imported, it may make | |
558 existing, non-visible, already-deserialized AST nodes visible; it is the | |
559 responsibility of the AST reader to find and update these AST nodes when it | |
560 is notified of the import. | |
561 |