120
|
1 =====================================
|
|
2 The PDB File Format
|
|
3 =====================================
|
|
4
|
|
5 .. contents::
|
|
6 :local:
|
|
7
|
|
8 .. _pdb_intro:
|
|
9
|
|
10 Introduction
|
|
11 ============
|
|
12
|
|
13 PDB (Program Database) is a file format invented by Microsoft and which contains
|
|
14 debug information that can be consumed by debuggers and other tools. Since
|
|
15 officially supported APIs exist on Windows for querying debug information from
|
|
16 PDBs even without the user understanding the internals of the file format, a
|
|
17 large ecosystem of tools has been built for Windows to consume this format. In
|
|
18 order for Clang to be able to generate programs that can interoperate with these
|
|
19 tools, it is necessary for us to generate PDB files ourselves.
|
|
20
|
|
21 At the same time, LLVM has a long history of being able to cross-compile from
|
|
22 any platform to any platform, and we wish for the same to be true here. So it
|
|
23 is necessary for us to understand the PDB file format at the byte-level so that
|
|
24 we can generate PDB files entirely on our own.
|
|
25
|
|
26 This manual describes what we know about the PDB file format today. The layout
|
|
27 of the file, the various streams contained within, the format of individual
|
|
28 records within, and more.
|
|
29
|
|
30 We would like to extend our heartfelt gratitude to Microsoft, without whom we
|
|
31 would not be where we are today. Much of the knowledge contained within this
|
|
32 manual was learned through reading code published by Microsoft on their `GitHub
|
|
33 repo <https://github.com/Microsoft/microsoft-pdb>`__.
|
|
34
|
|
35 .. _pdb_layout:
|
|
36
|
|
37 File Layout
|
|
38 ===========
|
|
39
|
|
40 .. important::
|
|
41 Unless otherwise specified, all numeric values are encoded in little endian.
|
|
42 If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
|
|
43 assume it is little endian!
|
|
44
|
|
45 .. toctree::
|
|
46 :hidden:
|
|
47
|
|
48 MsfFile
|
|
49 PdbStream
|
|
50 TpiStream
|
|
51 DbiStream
|
|
52 ModiStream
|
|
53 PublicStream
|
|
54 GlobalStream
|
|
55 HashStream
|
121
|
56 CodeViewSymbols
|
|
57 CodeViewTypes
|
120
|
58
|
|
59 .. _msf:
|
|
60
|
|
61 The MSF Container
|
|
62 -----------------
|
|
63 A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
|
|
64 An MSF file is actually a miniature "file system within a file". It contains
|
|
65 multiple streams (aka files) which can represent arbitrary data, and these
|
|
66 streams are divided into blocks which may not necessarily be contiguously
|
|
67 laid out within the file (aka fragmented). Additionally, the MSF contains a
|
|
68 stream directory (aka MFT) which describes how the streams (files) are laid
|
|
69 out within the MSF.
|
|
70
|
|
71 For more information about the MSF container format, stream directory, and
|
|
72 block layout, see :doc:`MsfFile`.
|
|
73
|
|
74 .. _streams:
|
|
75
|
|
76 Streams
|
|
77 -------
|
|
78 The PDB format contains a number of streams which describe various information
|
|
79 such as the types, symbols, source files, and compilands (e.g. object files)
|
|
80 of a program, as well as some additional streams containing hash tables that are
|
|
81 used by debuggers and other tools to provide fast lookup of records and types
|
|
82 by name, and various other information about how the program was compiled such
|
|
83 as the specific toolchain used, and more. A summary of streams contained in a
|
|
84 PDB file is as follows:
|
|
85
|
|
86 +--------------------+------------------------------+-------------------------------------------+
|
|
87 | Name | Stream Index | Contents |
|
|
88 +====================+==============================+===========================================+
|
|
89 | Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
|
|
90 +--------------------+------------------------------+-------------------------------------------+
|
|
91 | PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
|
|
92 | | | - Fields to match EXE to this PDB |
|
|
93 | | | - Map of named streams to stream indices |
|
|
94 +--------------------+------------------------------+-------------------------------------------+
|
|
95 | TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
|
|
96 | | | - Index of TPI Hash Stream |
|
|
97 +--------------------+------------------------------+-------------------------------------------+
|
|
98 | DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
|
|
99 | | | - Indices of individual module streams |
|
|
100 | | | - Indices of public / global streams |
|
|
101 | | | - Section Contribution Information |
|
|
102 | | | - Source File Information |
|
|
103 | | | - FPO / PGO Data |
|
|
104 +--------------------+------------------------------+-------------------------------------------+
|
|
105 | IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
|
|
106 | | | - Index of IPI Hash Stream |
|
|
107 +--------------------+------------------------------+-------------------------------------------+
|
|
108 | /LinkInfo | - Contained in PDB Stream | - Unknown |
|
|
109 | | Named Stream map | |
|
|
110 +--------------------+------------------------------+-------------------------------------------+
|
|
111 | /src/headerblock | - Contained in PDB Stream | - Unknown |
|
|
112 | | Named Stream map | |
|
|
113 +--------------------+------------------------------+-------------------------------------------+
|
|
114 | /names | - Contained in PDB Stream | - PDB-wide global string table used for |
|
|
115 | | Named Stream map | string de-duplication |
|
|
116 +--------------------+------------------------------+-------------------------------------------+
|
|
117 | Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
|
|
118 | | - One for each compiland | - Line Number Information |
|
|
119 +--------------------+------------------------------+-------------------------------------------+
|
|
120 | Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
|
|
121 | | | - Index of Public Hash Stream |
|
|
122 +--------------------+------------------------------+-------------------------------------------+
|
|
123 | Global Stream | - Contained in DBI Stream | - Global Symbol Records |
|
|
124 | | | - Index of Global Hash Stream |
|
|
125 +--------------------+------------------------------+-------------------------------------------+
|
|
126 | TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
|
|
127 | | | by name |
|
|
128 +--------------------+------------------------------+-------------------------------------------+
|
|
129 | IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
|
|
130 | | | by name |
|
|
131 +--------------------+------------------------------+-------------------------------------------+
|
|
132
|
|
133 More information about the structure of each of these can be found on the
|
|
134 following pages:
|
|
135
|
|
136 :doc:`PdbStream`
|
|
137 Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
|
|
138
|
|
139 :doc:`TpiStream`
|
|
140 Information about the TPI stream and the CodeView records contained within.
|
|
141
|
|
142 :doc:`DbiStream`
|
|
143 Information about the DBI stream and relevant substreams including the Module Substreams,
|
|
144 source file information, and CodeView symbol records contained within.
|
|
145
|
|
146 :doc:`ModiStream`
|
|
147 Information about the Module Information Stream, of which there is one for each compilation
|
|
148 unit and the format of symbols contained within.
|
|
149
|
|
150 :doc:`PublicStream`
|
|
151 Information about the Public Symbol Stream.
|
|
152
|
|
153 :doc:`GlobalStream`
|
|
154 Information about the Global Symbol Stream.
|
|
155
|
|
156 :doc:`HashStream`
|
|
157 Information about the Hash Table stream, and how it can be used to quickly look up records
|
|
158 by name.
|
|
159
|
|
160 CodeView
|
|
161 ========
|
|
162 CodeView is another format which comes into the picture. While MSF defines
|
|
163 the structure of the overall file, and PDB defines the set of streams that
|
|
164 appear within the MSF file and the format of those streams, CodeView defines
|
|
165 the format of **symbol and type records** that appear within specific streams.
|
121
|
166 Refer to the pages on :doc:`CodeViewSymbols` and :doc:`CodeViewTypes` for
|
120
|
167 more information about the CodeView format.
|