diff docs/PDB/index.rst @ 120:1172e4bd9c6f

update 4.0.0
author mir3636
date Fri, 25 Nov 2016 19:14:25 +0900
parents
children 803732b1fca8
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/docs/PDB/index.rst	Fri Nov 25 19:14:25 2016 +0900
@@ -0,0 +1,165 @@
+=====================================
+The PDB File Format
+=====================================
+
+.. contents::
+   :local:
+
+.. _pdb_intro:
+
+Introduction
+============
+
+PDB (Program Database) is a file format invented by Microsoft and which contains
+debug information that can be consumed by debuggers and other tools.  Since
+officially supported APIs exist on Windows for querying debug information from
+PDBs even without the user understanding the internals of the file format, a
+large ecosystem of tools has been built for Windows to consume this format.  In
+order for Clang to be able to generate programs that can interoperate with these
+tools, it is necessary for us to generate PDB files ourselves.
+
+At the same time, LLVM has a long history of being able to cross-compile from
+any platform to any platform, and we wish for the same to be true here.  So it
+is necessary for us to understand the PDB file format at the byte-level so that
+we can generate PDB files entirely on our own.
+
+This manual describes what we know about the PDB file format today.  The layout
+of the file, the various streams contained within, the format of individual
+records within, and more.
+
+We would like to extend our heartfelt gratitude to Microsoft, without whom we
+would not be where we are today.  Much of the knowledge contained within this
+manual was learned through reading code published by Microsoft on their `GitHub
+repo <https://github.com/Microsoft/microsoft-pdb>`__.
+
+.. _pdb_layout:
+
+File Layout
+===========
+
+.. important::
+   Unless otherwise specified, all numeric values are encoded in little endian.
+   If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always
+   assume it is little endian!
+
+.. toctree::
+   :hidden:
+   
+   MsfFile
+   PdbStream
+   TpiStream
+   DbiStream
+   ModiStream
+   PublicStream
+   GlobalStream
+   HashStream
+
+.. _msf:
+
+The MSF Container
+-----------------
+A PDB file is really just a special case of an MSF (Multi-Stream Format) file.
+An MSF file is actually a miniature "file system within a file".  It contains
+multiple streams (aka files) which can represent arbitrary data, and these
+streams are divided into blocks which may not necessarily be contiguously
+laid out within the file (aka fragmented).  Additionally, the MSF contains a
+stream directory (aka MFT) which describes how the streams (files) are laid
+out within the MSF.
+
+For more information about the MSF container format, stream directory, and
+block layout, see :doc:`MsfFile`.
+
+.. _streams:
+
+Streams
+-------
+The PDB format contains a number of streams which describe various information
+such as the types, symbols, source files, and compilands (e.g. object files)
+of a program, as well as some additional streams containing hash tables that are
+used by debuggers and other tools to provide fast lookup of records and types
+by name, and various other information about how the program was compiled such
+as the specific toolchain used, and more.  A summary of streams contained in a
+PDB file is as follows:
+
++--------------------+------------------------------+-------------------------------------------+
+| Name               | Stream Index                 | Contents                                  |
++====================+==============================+===========================================+
+| Old Directory      | - Fixed Stream Index 0       | - Previous MSF Stream Directory           |
++--------------------+------------------------------+-------------------------------------------+
+| PDB Stream         | - Fixed Stream Index 1       | - Basic File Information                  |
+|                    |                              | - Fields to match EXE to this PDB         |
+|                    |                              | - Map of named streams to stream indices  |
++--------------------+------------------------------+-------------------------------------------+
+| TPI Stream         | - Fixed Stream Index 2       | - CodeView Type Records                   |
+|                    |                              | - Index of TPI Hash Stream                |
++--------------------+------------------------------+-------------------------------------------+
+| DBI Stream         | - Fixed Stream Index 3       | - Module/Compiland Information            |
+|                    |                              | - Indices of individual module streams    |
+|                    |                              | - Indices of public / global streams      |
+|                    |                              | - Section Contribution Information        |
+|                    |                              | - Source File Information                 |
+|                    |                              | - FPO / PGO Data                          |
++--------------------+------------------------------+-------------------------------------------+
+| IPI Stream         | - Fixed Stream Index 4       | - CodeView Type Records                   |
+|                    |                              | - Index of IPI Hash Stream                |
++--------------------+------------------------------+-------------------------------------------+
+| /LinkInfo          | - Contained in PDB Stream    | - Unknown                                 |
+|                    |   Named Stream map           |                                           |
++--------------------+------------------------------+-------------------------------------------+
+| /src/headerblock   | - Contained in PDB Stream    | - Unknown                                 |
+|                    |   Named Stream map           |                                           |
++--------------------+------------------------------+-------------------------------------------+
+| /names             | - Contained in PDB Stream    | - PDB-wide global string table used for   |
+|                    |   Named Stream map           |   string de-duplication                   |
++--------------------+------------------------------+-------------------------------------------+
+| Module Info Stream | - Contained in DBI Stream    | - CodeView Symbol Records for this module |
+|                    | - One for each compiland     | - Line Number Information                 |
++--------------------+------------------------------+-------------------------------------------+
+| Public Stream      | - Contained in DBI Stream    | - Public (Exported) Symbol Records        |
+|                    |                              | - Index of Public Hash Stream             |
++--------------------+------------------------------+-------------------------------------------+
+| Global Stream      | - Contained in DBI Stream    | - Global Symbol Records                   |
+|                    |                              | - Index of Global Hash Stream             |
++--------------------+------------------------------+-------------------------------------------+
+| TPI Hash Stream    | - Contained in TPI Stream    | - Hash table for looking up TPI records   |
+|                    |                              |   by name                                 |
++--------------------+------------------------------+-------------------------------------------+
+| IPI Hash Stream    | - Contained in IPI Stream    | - Hash table for looking up IPI records   |
+|                    |                              |   by name                                 |
++--------------------+------------------------------+-------------------------------------------+
+
+More information about the structure of each of these can be found on the
+following pages:
+   
+:doc:`PdbStream`
+   Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
+
+:doc:`TpiStream`
+   Information about the TPI stream and the CodeView records contained within.
+
+:doc:`DbiStream`
+   Information about the DBI stream and relevant substreams including the Module Substreams,
+   source file information, and CodeView symbol records contained within.
+
+:doc:`ModiStream`
+   Information about the Module Information Stream, of which there is one for each compilation
+   unit and the format of symbols contained within.
+
+:doc:`PublicStream`
+   Information about the Public Symbol Stream.
+
+:doc:`GlobalStream`
+   Information about the Global Symbol Stream.
+
+:doc:`HashStream`
+   Information about the Hash Table stream, and how it can be used to quickly look up records
+   by name.
+
+CodeView
+========
+CodeView is another format which comes into the picture.  While MSF defines
+the structure of the overall file, and PDB defines the set of streams that
+appear within the MSF file and the format of those streams, CodeView defines
+the format of **symbol and type records** that appear within specific streams.
+Refer to the pages on `CodeView Symbol Records` and `CodeView Type Records` for
+more information about the CodeView format.