annotate docs/PDB/MsfFile.rst @ 121:803732b1fca8

LLVM 5.0
author kono
date Fri, 27 Oct 2017 17:07:41 +0900
parents 1172e4bd9c6f
children 3a76565eade5
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
120
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
1 =====================================
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
2 The MSF File Format
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
3 =====================================
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
4
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
5 .. contents::
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
6 :local:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
7
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
8 .. _msf_superblock:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
9
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
10 The Superblock
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
11 ==============
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
12 At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
13 follows:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
14
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
15 .. code-block:: c++
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
16
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
17 struct SuperBlock {
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
18 char FileMagic[sizeof(Magic)];
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
19 ulittle32_t BlockSize;
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
20 ulittle32_t FreeBlockMapBlock;
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
21 ulittle32_t NumBlocks;
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
22 ulittle32_t NumDirectoryBytes;
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
23 ulittle32_t Unknown;
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
24 ulittle32_t BlockMapAddr;
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
25 };
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
26
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
27 - **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"``
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
28 followed by the bytes ``1A 44 53 00 00 00``.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
29 - **BlockSize** - The block size of the internal file system. Valid values are
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
30 512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
31 depending on the block sizes. For the purposes of LLVM, we handle only block
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
32 sizes of 4KiB, and all further discussion assumes a block size of 4KiB.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
33 - **FreeBlockMapBlock** - The index of a block within the file, at which begins
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
34 a bitfield representing the set of all blocks within the file which are "free"
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
35 (i.e. the data within that block is not used). This bitfield is spread across
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
36 the MSF file at ``BlockSize`` intervals.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
37 **Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! This field
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
38 is designed to support incremental and atomic updates of the underlying MSF
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
39 file. While writing to an MSF file, if the value of this field is `1`, you
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
40 can write your new modified bitfield to page 2, and vice versa. Only when
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
41 you commit the file to disk do you need to swap the value in the SuperBlock
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
42 to point to the new ``FreeBlockMapBlock``.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
43 - **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize``
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
44 should equal the size of the file on disk.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
45 - **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
46 directory contains information about each stream's size and the set of blocks
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
47 that it occupies. It will be described in more detail later.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
48 - **BlockMapAddr** - The index of a block within the MSF file. At this block is
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
49 an array of ``ulittle32_t``'s listing the blocks that the stream directory
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
50 resides on. For large MSF files, the stream directory (which describes the
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
51 block layout of each stream) may not fit entirely on a single block. As a
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
52 result, this extra layer of indirection is introduced, whereby this block
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
53 contains the list of blocks that the stream directory occupies, and the stream
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
54 directory itself can be stitched together accordingly. The number of
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
55 ``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
56
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
57 The Stream Directory
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
58 ====================
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
59 The Stream Directory is the root of all access to the other streams in an MSF
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
60 file. Beginning at byte 0 of the stream directory is the following structure:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
61
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
62 .. code-block:: c++
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
63
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
64 struct StreamDirectory {
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
65 ulittle32_t NumStreams;
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
66 ulittle32_t StreamSizes[NumStreams];
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
67 ulittle32_t StreamBlocks[NumStreams][];
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
68 };
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
69
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
70 And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
71 Note that each of the last two arrays is of variable length, and in particular
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
72 that the second array is jagged.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
73
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
74 **Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
75 streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
76
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
77 Stream 0: ceil(1000 / 4096) = 1 block
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
78
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
79 Stream 1: ceil(8000 / 4096) = 2 blocks
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
80
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
81 Stream 2: ceil(16000 / 4096) = 4 blocks
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
82
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
83 Stream 3: ceil(9000 / 4096) = 3 blocks
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
84
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
85 In total, 10 blocks are used. Let's see what the stream directory might look
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
86 like:
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
87
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
88 .. code-block:: c++
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
89
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
90 struct StreamDirectory {
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
91 ulittle32_t NumStreams = 4;
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
92 ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
93 ulittle32_t StreamBlocks[][] = {
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
94 {4},
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
95 {5, 6},
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
96 {11, 9, 7, 8},
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
97 {10, 15, 12}
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
98 };
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
99 };
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
100
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
101 In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes``
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
102 would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
103 ``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``.
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
104
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
105 Note also that the streams are discontiguous, and that part of stream 3 is in the
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
106 middle of part of stream 2. You cannot assume anything about the layout of the
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
107 blocks!
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
108
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
109 Alignment and Block Boundaries
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
110 ==============================
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
111 As may be clear by now, it is possible for a single field (whether it be a high
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
112 level record, a long string field, or even a single ``uint16``) to begin and
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
113 end in separate blocks. For example, if the block size is 4096 bytes, and a
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
114 ``uint16`` field begins at the last byte of the current block, then it would
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
115 need to end on the first byte of the next block. Since blocks are not
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
116 necessarily contiguously laid out in the file, this means that both the consumer
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
117 and the producer of an MSF file must be prepared to split data apart
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
118 accordingly. In the aforementioned example, the high byte of the ``uint16``
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
119 would be written to the last byte of block N, and the low byte would be written
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
120 to the first byte of block N+1, which could be tens of thousands of bytes later
1172e4bd9c6f update 4.0.0
mir3636
parents:
diff changeset
121 (or even earlier!) in the file, depending on what the stream directory says.