Mercurial > hg > CbC > CbC_llvm
comparison lld/docs/NewLLD.rst @ 150:1d019706d866
LLVM10
author | anatofuz |
---|---|
date | Thu, 13 Feb 2020 15:10:13 +0900 |
parents | |
children | 0572611fdcc8 |
comparison
equal
deleted
inserted
replaced
147:c2174574ed3a | 150:1d019706d866 |
---|---|
1 The ELF, COFF and Wasm Linkers | |
2 ============================== | |
3 | |
4 The ELF Linker as a Library | |
5 --------------------------- | |
6 | |
7 You can embed LLD to your program by linking against it and calling the linker's | |
8 entry point function lld::elf::link. | |
9 | |
10 The current policy is that it is your responsibility to give trustworthy object | |
11 files. The function is guaranteed to return as long as you do not pass corrupted | |
12 or malicious object files. A corrupted file could cause a fatal error or SEGV. | |
13 That being said, you don't need to worry too much about it if you create object | |
14 files in the usual way and give them to the linker. It is naturally expected to | |
15 work, or otherwise it's a linker's bug. | |
16 | |
17 Design | |
18 ====== | |
19 | |
20 We will describe the design of the linkers in the rest of the document. | |
21 | |
22 Key Concepts | |
23 ------------ | |
24 | |
25 Linkers are fairly large pieces of software. | |
26 There are many design choices you have to make to create a complete linker. | |
27 | |
28 This is a list of design choices we've made for ELF and COFF LLD. | |
29 We believe that these high-level design choices achieved a right balance | |
30 between speed, simplicity and extensibility. | |
31 | |
32 * Implement as native linkers | |
33 | |
34 We implemented the linkers as native linkers for each file format. | |
35 | |
36 The linkers share the same design but share very little code. | |
37 Sharing code makes sense if the benefit is worth its cost. | |
38 In our case, the object formats are different enough that we thought the layer | |
39 to abstract the differences wouldn't be worth its complexity and run-time | |
40 cost. Elimination of the abstract layer has greatly simplified the | |
41 implementation. | |
42 | |
43 * Speed by design | |
44 | |
45 One of the most important things in archiving high performance is to | |
46 do less rather than do it efficiently. | |
47 Therefore, the high-level design matters more than local optimizations. | |
48 Since we are trying to create a high-performance linker, | |
49 it is very important to keep the design as efficient as possible. | |
50 | |
51 Broadly speaking, we do not do anything until we have to do it. | |
52 For example, we do not read section contents or relocations | |
53 until we need them to continue linking. | |
54 When we need to do some costly operation (such as looking up | |
55 a hash table for each symbol), we do it only once. | |
56 We obtain a handle (which is typically just a pointer to actual data) | |
57 on the first operation and use it throughout the process. | |
58 | |
59 * Efficient archive file handling | |
60 | |
61 LLD's handling of archive files (the files with ".a" file extension) is | |
62 different from the traditional Unix linkers and similar to Windows linkers. | |
63 We'll describe how the traditional Unix linker handles archive files, what the | |
64 problem is, and how LLD approached the problem. | |
65 | |
66 The traditional Unix linker maintains a set of undefined symbols during | |
67 linking. The linker visits each file in the order as they appeared in the | |
68 command line until the set becomes empty. What the linker would do depends on | |
69 file type. | |
70 | |
71 - If the linker visits an object file, the linker links object files to the | |
72 result, and undefined symbols in the object file are added to the set. | |
73 | |
74 - If the linker visits an archive file, it checks for the archive file's | |
75 symbol table and extracts all object files that have definitions for any | |
76 symbols in the set. | |
77 | |
78 This algorithm sometimes leads to a counter-intuitive behavior. If you give | |
79 archive files before object files, nothing will happen because when the linker | |
80 visits archives, there is no undefined symbols in the set. As a result, no | |
81 files are extracted from the first archive file, and the link is done at that | |
82 point because the set is empty after it visits one file. | |
83 | |
84 You can fix the problem by reordering the files, | |
85 but that cannot fix the issue of mutually-dependent archive files. | |
86 | |
87 Linking mutually-dependent archive files is tricky. You may specify the same | |
88 archive file multiple times to let the linker visit it more than once. Or, | |
89 you may use the special command line options, `--start-group` and | |
90 `--end-group`, to let the linker loop over the files between the options until | |
91 no new symbols are added to the set. | |
92 | |
93 Visiting the same archive files multiple times makes the linker slower. | |
94 | |
95 Here is how LLD approaches the problem. Instead of memorizing only undefined | |
96 symbols, we program LLD so that it memorizes all symbols. When it sees an | |
97 undefined symbol that can be resolved by extracting an object file from an | |
98 archive file it previously visited, it immediately extracts the file and links | |
99 it. It is doable because LLD does not forget symbols it has seen in archive | |
100 files. | |
101 | |
102 We believe that LLD's way is efficient and easy to justify. | |
103 | |
104 The semantics of LLD's archive handling are different from the traditional | |
105 Unix's. You can observe it if you carefully craft archive files to exploit | |
106 it. However, in reality, we don't know any program that cannot link with our | |
107 algorithm so far, so it's not going to cause trouble. | |
108 | |
109 Numbers You Want to Know | |
110 ------------------------ | |
111 | |
112 To give you intuition about what kinds of data the linker is mainly working on, | |
113 I'll give you the list of objects and their numbers LLD has to read and process | |
114 in order to link a very large executable. In order to link Chrome with debug | |
115 info, which is roughly 2 GB in output size, LLD reads | |
116 | |
117 - 17,000 files, | |
118 - 1,800,000 sections, | |
119 - 6,300,000 symbols, and | |
120 - 13,000,000 relocations. | |
121 | |
122 LLD produces the 2 GB executable in 15 seconds. | |
123 | |
124 These numbers vary depending on your program, but in general, | |
125 you have a lot of relocations and symbols for each file. | |
126 If your program is written in C++, symbol names are likely to be | |
127 pretty long because of name mangling. | |
128 | |
129 It is important to not waste time on relocations and symbols. | |
130 | |
131 In the above case, the total amount of symbol strings is 450 MB, | |
132 and inserting all of them to a hash table takes 1.5 seconds. | |
133 Therefore, if you causally add a hash table lookup for each symbol, | |
134 it would slow down the linker by 10%. So, don't do that. | |
135 | |
136 On the other hand, you don't have to pursue efficiency | |
137 when handling files. | |
138 | |
139 Important Data Structures | |
140 ------------------------- | |
141 | |
142 We will describe the key data structures in LLD in this section. The linker can | |
143 be understood as the interactions between them. Once you understand their | |
144 functions, the code of the linker should look obvious to you. | |
145 | |
146 * Symbol | |
147 | |
148 This class represents a symbol. | |
149 They are created for symbols in object files or archive files. | |
150 The linker creates linker-defined symbols as well. | |
151 | |
152 There are basically three types of Symbols: Defined, Undefined, or Lazy. | |
153 | |
154 - Defined symbols are for all symbols that are considered as "resolved", | |
155 including real defined symbols, COMDAT symbols, common symbols, | |
156 absolute symbols, linker-created symbols, etc. | |
157 - Undefined symbols represent undefined symbols, which need to be replaced by | |
158 Defined symbols by the resolver until the link is complete. | |
159 - Lazy symbols represent symbols we found in archive file headers | |
160 which can turn into Defined if we read archive members. | |
161 | |
162 There's only one Symbol instance for each unique symbol name. This uniqueness | |
163 is guaranteed by the symbol table. As the resolver reads symbols from input | |
164 files, it replaces an existing Symbol with the "best" Symbol for its symbol | |
165 name using the placement new. | |
166 | |
167 The above mechanism allows you to use pointers to Symbols as a very cheap way | |
168 to access name resolution results. Assume for example that you have a pointer | |
169 to an undefined symbol before name resolution. If the symbol is resolved to a | |
170 defined symbol by the resolver, the pointer will "automatically" point to the | |
171 defined symbol, because the undefined symbol the pointer pointed to will have | |
172 been replaced by the defined symbol in-place. | |
173 | |
174 * SymbolTable | |
175 | |
176 SymbolTable is basically a hash table from strings to Symbols | |
177 with logic to resolve symbol conflicts. It resolves conflicts by symbol type. | |
178 | |
179 - If we add Defined and Undefined symbols, the symbol table will keep the | |
180 former. | |
181 - If we add Defined and Lazy symbols, it will keep the former. | |
182 - If we add Lazy and Undefined, it will keep the former, | |
183 but it will also trigger the Lazy symbol to load the archive member | |
184 to actually resolve the symbol. | |
185 | |
186 * Chunk (COFF specific) | |
187 | |
188 Chunk represents a chunk of data that will occupy space in an output. | |
189 Each regular section becomes a chunk. | |
190 Chunks created for common or BSS symbols are not backed by sections. | |
191 The linker may create chunks to append additional data to an output as well. | |
192 | |
193 Chunks know about their size, how to copy their data to mmap'ed outputs, | |
194 and how to apply relocations to them. | |
195 Specifically, section-based chunks know how to read relocation tables | |
196 and how to apply them. | |
197 | |
198 * InputSection (ELF specific) | |
199 | |
200 Since we have less synthesized data for ELF, we don't abstract slices of | |
201 input files as Chunks for ELF. Instead, we directly use the input section | |
202 as an internal data type. | |
203 | |
204 InputSection knows about their size and how to copy themselves to | |
205 mmap'ed outputs, just like COFF Chunks. | |
206 | |
207 * OutputSection | |
208 | |
209 OutputSection is a container of InputSections (ELF) or Chunks (COFF). | |
210 An InputSection or Chunk belongs to at most one OutputSection. | |
211 | |
212 There are mainly three actors in this linker. | |
213 | |
214 * InputFile | |
215 | |
216 InputFile is a superclass of file readers. | |
217 We have a different subclass for each input file type, | |
218 such as regular object file, archive file, etc. | |
219 They are responsible for creating and owning Symbols and InputSections/Chunks. | |
220 | |
221 * Writer | |
222 | |
223 The writer is responsible for writing file headers and InputSections/Chunks to | |
224 a file. It creates OutputSections, put all InputSections/Chunks into them, | |
225 assign unique, non-overlapping addresses and file offsets to them, and then | |
226 write them down to a file. | |
227 | |
228 * Driver | |
229 | |
230 The linking process is driven by the driver. The driver: | |
231 | |
232 - processes command line options, | |
233 - creates a symbol table, | |
234 - creates an InputFile for each input file and puts all symbols within into | |
235 the symbol table, | |
236 - checks if there's no remaining undefined symbols, | |
237 - creates a writer, | |
238 - and passes the symbol table to the writer to write the result to a file. | |
239 | |
240 Link-Time Optimization | |
241 ---------------------- | |
242 | |
243 LTO is implemented by handling LLVM bitcode files as object files. | |
244 The linker resolves symbols in bitcode files normally. If all symbols | |
245 are successfully resolved, it then runs LLVM passes | |
246 with all bitcode files to convert them to one big regular ELF/COFF file. | |
247 Finally, the linker replaces bitcode symbols with ELF/COFF symbols, | |
248 so that they are linked as if they were in the native format from the beginning. | |
249 | |
250 The details are described in this document. | |
251 http://llvm.org/docs/LinkTimeOptimization.html | |
252 | |
253 Glossary | |
254 -------- | |
255 | |
256 * RVA (COFF) | |
257 | |
258 Short for Relative Virtual Address. | |
259 | |
260 Windows executables or DLLs are not position-independent; they are | |
261 linked against a fixed address called an image base. RVAs are | |
262 offsets from an image base. | |
263 | |
264 Default image bases are 0x140000000 for executables and 0x18000000 | |
265 for DLLs. For example, when we are creating an executable, we assume | |
266 that the executable will be loaded at address 0x140000000 by the | |
267 loader, so we apply relocations accordingly. Result texts and data | |
268 will contain raw absolute addresses. | |
269 | |
270 * VA | |
271 | |
272 Short for Virtual Address. For COFF, it is equivalent to RVA + image base. | |
273 | |
274 * Base relocations (COFF) | |
275 | |
276 Relocation information for the loader. If the loader decides to map | |
277 an executable or a DLL to a different address than their image | |
278 bases, it fixes up binaries using information contained in the base | |
279 relocation table. A base relocation table consists of a list of | |
280 locations containing addresses. The loader adds a difference between | |
281 RVA and actual load address to all locations listed there. | |
282 | |
283 Note that this run-time relocation mechanism is much simpler than ELF. | |
284 There's no PLT or GOT. Images are relocated as a whole just | |
285 by shifting entire images in memory by some offsets. Although doing | |
286 this breaks text sharing, I think this mechanism is not actually bad | |
287 on today's computers. | |
288 | |
289 * ICF | |
290 | |
291 Short for Identical COMDAT Folding (COFF) or Identical Code Folding (ELF). | |
292 | |
293 ICF is an optimization to reduce output size by merging read-only sections | |
294 by not only their names but by their contents. If two read-only sections | |
295 happen to have the same metadata, actual contents and relocations, | |
296 they are merged by ICF. It is known as an effective technique, | |
297 and it usually reduces C++ program's size by a few percent or more. | |
298 | |
299 Note that this is not an entirely sound optimization. C/C++ require | |
300 different functions have different addresses. If a program depends on | |
301 that property, it would fail at runtime. | |
302 | |
303 On Windows, that's not really an issue because MSVC link.exe enabled | |
304 the optimization by default. As long as your program works | |
305 with the linker's default settings, your program should be safe with ICF. | |
306 | |
307 On Unix, your program is generally not guaranteed to be safe with ICF, | |
308 although large programs happen to work correctly. | |
309 LLD works fine with ICF for example. | |
310 | |
311 Other Info | |
312 ---------- | |
313 | |
314 .. toctree:: | |
315 :maxdepth: 1 | |
316 | |
317 missingkeyfunction |