annotate docs/PDB/index.rst @ 121:803732b1fca8

LLVM 5.0
author kono
date Fri, 27 Oct 2017 17:07:41 +0900
parents 1172e4bd9c6f
children c2174574ed3a
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
120
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
1 =====================================
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
2 The PDB File Format
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
3 =====================================
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
4
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
5 .. contents::
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
6 :local:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
7
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
8 .. _pdb_intro:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
9
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
10 Introduction
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
11 ============
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
12
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
13 PDB (Program Database) is a file format invented by Microsoft and which contains
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
14 debug information that can be consumed by debuggers and other tools. Since
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
15 officially supported APIs exist on Windows for querying debug information from
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
16 PDBs even without the user understanding the internals of the file format, a
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
17 large ecosystem of tools has been built for Windows to consume this format. In
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
18 order for Clang to be able to generate programs that can interoperate with these
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
19 tools, it is necessary for us to generate PDB files ourselves.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
20
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
21 At the same time, LLVM has a long history of being able to cross-compile from
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
22 any platform to any platform, and we wish for the same to be true here. So it
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
23 is necessary for us to understand the PDB file format at the byte-level so that
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
24 we can generate PDB files entirely on our own.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
25
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
26 This manual describes what we know about the PDB file format today. The layout
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
27 of the file, the various streams contained within, the format of individual
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
28 records within, and more.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
29
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
30 We would like to extend our heartfelt gratitude to Microsoft, without whom we
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
31 would not be where we are today. Much of the knowledge contained within this
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
32 manual was learned through reading code published by Microsoft on their `GitHub
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
33 repo <https://github.com/Microsoft/microsoft-pdb>`__.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
34
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
35 .. _pdb_layout:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
36
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
37 File Layout
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
38 ===========
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
39
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
40 .. important::
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
41 Unless otherwise specified, all numeric values are encoded in little endian.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
42 If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
43 assume it is little endian!
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
44
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
45 .. toctree::
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
46 :hidden:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
47
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
48 MsfFile
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
49 PdbStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
50 TpiStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
51 DbiStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
52 ModiStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
53 PublicStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
54 GlobalStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
55 HashStream
121
803732b1fca8 LLVM 5.0
kono
parents: 120
diff changeset
56 CodeViewSymbols
803732b1fca8 LLVM 5.0
kono
parents: 120
diff changeset
57 CodeViewTypes
120
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
58
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
59 .. _msf:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
60
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
61 The MSF Container
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
62 -----------------
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
63 A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
64 An MSF file is actually a miniature "file system within a file". It contains
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
65 multiple streams (aka files) which can represent arbitrary data, and these
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
66 streams are divided into blocks which may not necessarily be contiguously
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
67 laid out within the file (aka fragmented). Additionally, the MSF contains a
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
68 stream directory (aka MFT) which describes how the streams (files) are laid
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
69 out within the MSF.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
70
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
71 For more information about the MSF container format, stream directory, and
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
72 block layout, see :doc:`MsfFile`.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
73
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
74 .. _streams:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
75
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
76 Streams
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
77 -------
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
78 The PDB format contains a number of streams which describe various information
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
79 such as the types, symbols, source files, and compilands (e.g. object files)
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
80 of a program, as well as some additional streams containing hash tables that are
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
81 used by debuggers and other tools to provide fast lookup of records and types
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
82 by name, and various other information about how the program was compiled such
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
83 as the specific toolchain used, and more. A summary of streams contained in a
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
84 PDB file is as follows:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
85
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
86 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
87 | Name | Stream Index | Contents |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
88 +====================+==============================+===========================================+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
89 | Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
90 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
91 | PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
92 | | | - Fields to match EXE to this PDB |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
93 | | | - Map of named streams to stream indices |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
94 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
95 | TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
96 | | | - Index of TPI Hash Stream |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
97 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
98 | DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
99 | | | - Indices of individual module streams |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
100 | | | - Indices of public / global streams |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
101 | | | - Section Contribution Information |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
102 | | | - Source File Information |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
103 | | | - FPO / PGO Data |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
104 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
105 | IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
106 | | | - Index of IPI Hash Stream |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
107 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
108 | /LinkInfo | - Contained in PDB Stream | - Unknown |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
109 | | Named Stream map | |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
110 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
111 | /src/headerblock | - Contained in PDB Stream | - Unknown |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
112 | | Named Stream map | |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
113 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
114 | /names | - Contained in PDB Stream | - PDB-wide global string table used for |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
115 | | Named Stream map | string de-duplication |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
116 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
117 | Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
118 | | - One for each compiland | - Line Number Information |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
119 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
120 | Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
121 | | | - Index of Public Hash Stream |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
122 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
123 | Global Stream | - Contained in DBI Stream | - Global Symbol Records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
124 | | | - Index of Global Hash Stream |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
125 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
126 | TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
127 | | | by name |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
128 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
129 | IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
130 | | | by name |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
131 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
132
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
133 More information about the structure of each of these can be found on the
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
134 following pages:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
135
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
136 :doc:`PdbStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
137 Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
138
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
139 :doc:`TpiStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
140 Information about the TPI stream and the CodeView records contained within.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
141
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
142 :doc:`DbiStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
143 Information about the DBI stream and relevant substreams including the Module Substreams,
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
144 source file information, and CodeView symbol records contained within.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
145
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
146 :doc:`ModiStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
147 Information about the Module Information Stream, of which there is one for each compilation
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
148 unit and the format of symbols contained within.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
149
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
150 :doc:`PublicStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
151 Information about the Public Symbol Stream.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
152
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
153 :doc:`GlobalStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
154 Information about the Global Symbol Stream.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
155
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
156 :doc:`HashStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
157 Information about the Hash Table stream, and how it can be used to quickly look up records
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
158 by name.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
159
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
160 CodeView
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
161 ========
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
162 CodeView is another format which comes into the picture. While MSF defines
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
163 the structure of the overall file, and PDB defines the set of streams that
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
164 appear within the MSF file and the format of those streams, CodeView defines
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
165 the format of **symbol and type records** that appear within specific streams.
121
803732b1fca8 LLVM 5.0
kono
parents: 120
diff changeset
166 Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
120
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
167 more information about the CodeView format.