annotate docs/PDB/index.rst @ 120:1172e4bd9c6f

update 4.0.0
author mir3636
date Fri, 25 Nov 2016 19:14:25 +0900
parents
children 803732b1fca8
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
120
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
1 =====================================
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
2 The PDB File Format
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
3 =====================================
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
4
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
5 .. contents::
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
6 :local:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
7
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
8 .. _pdb_intro:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
9
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
10 Introduction
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
11 ============
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
12
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
13 PDB (Program Database) is a file format invented by Microsoft and which contains
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
14 debug information that can be consumed by debuggers and other tools. Since
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
15 officially supported APIs exist on Windows for querying debug information from
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
16 PDBs even without the user understanding the internals of the file format, a
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
17 large ecosystem of tools has been built for Windows to consume this format. In
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
18 order for Clang to be able to generate programs that can interoperate with these
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
19 tools, it is necessary for us to generate PDB files ourselves.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
20
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
21 At the same time, LLVM has a long history of being able to cross-compile from
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
22 any platform to any platform, and we wish for the same to be true here. So it
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
23 is necessary for us to understand the PDB file format at the byte-level so that
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
24 we can generate PDB files entirely on our own.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
25
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
26 This manual describes what we know about the PDB file format today. The layout
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
27 of the file, the various streams contained within, the format of individual
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
28 records within, and more.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
29
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
30 We would like to extend our heartfelt gratitude to Microsoft, without whom we
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
31 would not be where we are today. Much of the knowledge contained within this
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
32 manual was learned through reading code published by Microsoft on their `GitHub
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
33 repo <https://github.com/Microsoft/microsoft-pdb>`__.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
34
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
35 .. _pdb_layout:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
36
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
37 File Layout
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
38 ===========
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
39
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
40 .. important::
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
41 Unless otherwise specified, all numeric values are encoded in little endian.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
42 If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
43 assume it is little endian!
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
44
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
45 .. toctree::
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
46 :hidden:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
47
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
48 MsfFile
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
49 PdbStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
50 TpiStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
51 DbiStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
52 ModiStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
53 PublicStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
54 GlobalStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
55 HashStream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
56
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
57 .. _msf:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
58
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
59 The MSF Container
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
60 -----------------
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
61 A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
62 An MSF file is actually a miniature "file system within a file". It contains
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
63 multiple streams (aka files) which can represent arbitrary data, and these
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
64 streams are divided into blocks which may not necessarily be contiguously
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
65 laid out within the file (aka fragmented). Additionally, the MSF contains a
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
66 stream directory (aka MFT) which describes how the streams (files) are laid
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
67 out within the MSF.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
68
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
69 For more information about the MSF container format, stream directory, and
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
70 block layout, see :doc:`MsfFile`.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
71
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
72 .. _streams:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
73
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
74 Streams
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
75 -------
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
76 The PDB format contains a number of streams which describe various information
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
77 such as the types, symbols, source files, and compilands (e.g. object files)
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
78 of a program, as well as some additional streams containing hash tables that are
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
79 used by debuggers and other tools to provide fast lookup of records and types
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
80 by name, and various other information about how the program was compiled such
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
81 as the specific toolchain used, and more. A summary of streams contained in a
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
82 PDB file is as follows:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
83
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
84 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
85 | Name | Stream Index | Contents |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
86 +====================+==============================+===========================================+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
87 | Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
88 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
89 | PDB Stream | - Fixed Stream Index 1 | - Basic File Information |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
90 | | | - Fields to match EXE to this PDB |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
91 | | | - Map of named streams to stream indices |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
92 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
93 | TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
94 | | | - Index of TPI Hash Stream |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
95 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
96 | DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
97 | | | - Indices of individual module streams |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
98 | | | - Indices of public / global streams |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
99 | | | - Section Contribution Information |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
100 | | | - Source File Information |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
101 | | | - FPO / PGO Data |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
102 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
103 | IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
104 | | | - Index of IPI Hash Stream |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
105 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
106 | /LinkInfo | - Contained in PDB Stream | - Unknown |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
107 | | Named Stream map | |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
108 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
109 | /src/headerblock | - Contained in PDB Stream | - Unknown |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
110 | | Named Stream map | |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
111 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
112 | /names | - Contained in PDB Stream | - PDB-wide global string table used for |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
113 | | Named Stream map | string de-duplication |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
114 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
115 | Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
116 | | - One for each compiland | - Line Number Information |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
117 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
118 | Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
119 | | | - Index of Public Hash Stream |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
120 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
121 | Global Stream | - Contained in DBI Stream | - Global Symbol Records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
122 | | | - Index of Global Hash Stream |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
123 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
124 | TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
125 | | | by name |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
126 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
127 | IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
128 | | | by name |
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
129 +--------------------+------------------------------+-------------------------------------------+
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
130
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
131 More information about the structure of each of these can be found on the
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
132 following pages:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
133
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
134 :doc:`PdbStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
135 Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
136
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
137 :doc:`TpiStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
138 Information about the TPI stream and the CodeView records contained within.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
139
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
140 :doc:`DbiStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
141 Information about the DBI stream and relevant substreams including the Module Substreams,
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
142 source file information, and CodeView symbol records contained within.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
143
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
144 :doc:`ModiStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
145 Information about the Module Information Stream, of which there is one for each compilation
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
146 unit and the format of symbols contained within.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
147
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
148 :doc:`PublicStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
149 Information about the Public Symbol Stream.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
150
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
151 :doc:`GlobalStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
152 Information about the Global Symbol Stream.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
153
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
154 :doc:`HashStream`
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
155 Information about the Hash Table stream, and how it can be used to quickly look up records
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
156 by name.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
157
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
158 CodeView
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
159 ========
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
160 CodeView is another format which comes into the picture. While MSF defines
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
161 the structure of the overall file, and PDB defines the set of streams that
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
162 appear within the MSF file and the format of those streams, CodeView defines
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
163 the format of **symbol and type records** that appear within specific streams.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
164 Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
165 more information about the CodeView format.