Mercurial > hg > CbC > CbC_llvm
diff docs/PDB/index.rst @ 120:1172e4bd9c6f
update 4.0.0
author | mir3636 |
---|---|
date | Fri, 25 Nov 2016 19:14:25 +0900 |
parents | |
children | 803732b1fca8 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/docs/PDB/index.rst Fri Nov 25 19:14:25 2016 +0900 @@ -0,0 +1,165 @@ +===================================== +The PDB File Format +===================================== + +.. contents:: + :local: + +.. _pdb_intro: + +Introduction +============ + +PDB (Program Database) is a file format invented by Microsoft and which contains +debug information that can be consumed by debuggers and other tools. Since +officially supported APIs exist on Windows for querying debug information from +PDBs even without the user understanding the internals of the file format, a +large ecosystem of tools has been built for Windows to consume this format. In +order for Clang to be able to generate programs that can interoperate with these +tools, it is necessary for us to generate PDB files ourselves. + +At the same time, LLVM has a long history of being able to cross-compile from +any platform to any platform, and we wish for the same to be true here. So it +is necessary for us to understand the PDB file format at the byte-level so that +we can generate PDB files entirely on our own. + +This manual describes what we know about the PDB file format today. The layout +of the file, the various streams contained within, the format of individual +records within, and more. + +We would like to extend our heartfelt gratitude to Microsoft, without whom we +would not be where we are today. Much of the knowledge contained within this +manual was learned through reading code published by Microsoft on their `GitHub +repo <https://github.com/Microsoft/microsoft-pdb>`__. + +.. _pdb_layout: + +File Layout +=========== + +.. important:: + Unless otherwise specified, all numeric values are encoded in little endian. + If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always + assume it is little endian! + +.. toctree:: + :hidden: + + MsfFile + PdbStream + TpiStream + DbiStream + ModiStream + PublicStream + GlobalStream + HashStream + +.. _msf: + +The MSF Container +----------------- +A PDB file is really just a special case of an MSF (Multi-Stream Format) file. +An MSF file is actually a miniature "file system within a file". It contains +multiple streams (aka files) which can represent arbitrary data, and these +streams are divided into blocks which may not necessarily be contiguously +laid out within the file (aka fragmented). Additionally, the MSF contains a +stream directory (aka MFT) which describes how the streams (files) are laid +out within the MSF. + +For more information about the MSF container format, stream directory, and +block layout, see :doc:`MsfFile`. + +.. _streams: + +Streams +------- +The PDB format contains a number of streams which describe various information +such as the types, symbols, source files, and compilands (e.g. object files) +of a program, as well as some additional streams containing hash tables that are +used by debuggers and other tools to provide fast lookup of records and types +by name, and various other information about how the program was compiled such +as the specific toolchain used, and more. A summary of streams contained in a +PDB file is as follows: + ++--------------------+------------------------------+-------------------------------------------+ +| Name | Stream Index | Contents | ++====================+==============================+===========================================+ +| Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory | ++--------------------+------------------------------+-------------------------------------------+ +| PDB Stream | - Fixed Stream Index 1 | - Basic File Information | +| | | - Fields to match EXE to this PDB | +| | | - Map of named streams to stream indices | ++--------------------+------------------------------+-------------------------------------------+ +| TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records | +| | | - Index of TPI Hash Stream | ++--------------------+------------------------------+-------------------------------------------+ +| DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information | +| | | - Indices of individual module streams | +| | | - Indices of public / global streams | +| | | - Section Contribution Information | +| | | - Source File Information | +| | | - FPO / PGO Data | ++--------------------+------------------------------+-------------------------------------------+ +| IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records | +| | | - Index of IPI Hash Stream | ++--------------------+------------------------------+-------------------------------------------+ +| /LinkInfo | - Contained in PDB Stream | - Unknown | +| | Named Stream map | | ++--------------------+------------------------------+-------------------------------------------+ +| /src/headerblock | - Contained in PDB Stream | - Unknown | +| | Named Stream map | | ++--------------------+------------------------------+-------------------------------------------+ +| /names | - Contained in PDB Stream | - PDB-wide global string table used for | +| | Named Stream map | string de-duplication | ++--------------------+------------------------------+-------------------------------------------+ +| Module Info Stream | - Contained in DBI Stream | - CodeView Symbol Records for this module | +| | - One for each compiland | - Line Number Information | ++--------------------+------------------------------+-------------------------------------------+ +| Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records | +| | | - Index of Public Hash Stream | ++--------------------+------------------------------+-------------------------------------------+ +| Global Stream | - Contained in DBI Stream | - Global Symbol Records | +| | | - Index of Global Hash Stream | ++--------------------+------------------------------+-------------------------------------------+ +| TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI records | +| | | by name | ++--------------------+------------------------------+-------------------------------------------+ +| IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI records | +| | | by name | ++--------------------+------------------------------+-------------------------------------------+ + +More information about the structure of each of these can be found on the +following pages: + +:doc:`PdbStream` + Information about the PDB Info Stream and how it is used to match PDBs to EXEs. + +:doc:`TpiStream` + Information about the TPI stream and the CodeView records contained within. + +:doc:`DbiStream` + Information about the DBI stream and relevant substreams including the Module Substreams, + source file information, and CodeView symbol records contained within. + +:doc:`ModiStream` + Information about the Module Information Stream, of which there is one for each compilation + unit and the format of symbols contained within. + +:doc:`PublicStream` + Information about the Public Symbol Stream. + +:doc:`GlobalStream` + Information about the Global Symbol Stream. + +:doc:`HashStream` + Information about the Hash Table stream, and how it can be used to quickly look up records + by name. + +CodeView +======== +CodeView is another format which comes into the picture. While MSF defines +the structure of the overall file, and PDB defines the set of streams that +appear within the MSF file and the format of those streams, CodeView defines +the format of **symbol and type records** that appear within specific streams. +Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for +more information about the CodeView format.