18 January 2009

Debug Symbols for MOSA #4 - PDB File Format

I'll continue my series with this posting about the PDB file format. There are some places on the net, which already describe the file format. However I've found some inaccuracies in most places - probably due to the fact that the format itself is not open and everyone performs some kind of interpretation of the data they see. Again, take this post with a grain of salt. I'm going to talk about the managed PDB files explicitly, so expect some differences compared to native code PDB files.

Test Code

Before we'll dive into the format itself I want to show you the sample code I'm going to use to describe the format with. The following C# source is compiled with debug support in order to generate a PDB file.

<span style="color:#999999;">1</span> <span style="color:#0000ff;">using</span><span style="color:#000000;"> System;<br /></span><span style="color:#999999;">2</span> <span style="color:#000000;"><br /></span><span style="color:#999999;">3</span> <span style="color:#000000;"></span><span style="color:#0000ff;">namespace</span><span style="color:#000000;"> PdbExample<br /></span><span style="color:#999999;">4</span> <span style="color:#000000;">{<br /></span><span style="color:#999999;">5</span> <span style="color:#000000;"></span><span style="color:#0000ff;">class</span><span style="color:#000000;"> Program<br /></span><span style="color:#999999;">6</span> <span style="color:#000000;">{<br /></span><span style="color:#999999;">7</span> <span style="color:#000000;"></span><span style="color:#0000ff;">static</span><span style="color:#000000;"> </span><span style="color:#0000ff;">void</span><span style="color:#000000;"> Main(</span><span style="color:#0000ff;">string</span><span style="color:#000000;">[] args)<br /></span><span style="color:#999999;">8</span> <span style="color:#000000;">{<br /></span><span style="color:#999999;">9</span> <span style="color:#000000;">Console.WriteLine(</span><span style="color:#800000;">"</span><span style="color:#800000;">Hello World!</span><span style="color:#800000;">"</span><span style="color:#000000;">);<br /></span><span style="color:#999999;">10</span> <span style="color:#000000;"></span><span style="color:#0000ff;">for</span><span style="color:#000000;"> (</span><span style="color:#0000ff;">int</span><span style="color:#000000;"> i </span><span style="color:#000000;">=</span><span style="color:#000000;"> </span><span style="color:#800080;">0</span><span style="color:#000000;">; i </span><span style="color:#000000;"><</span><span style="color:#000000;"> </span><span style="color:#800080;">10</span><span style="color:#000000;">; i</span><span style="color:#000000;">++</span><span style="color:#000000;">)<br /></span><span style="color:#999999;">11</span> <span style="color:#000000;">Console.WriteLine(</span><span style="color:#800000;">"</span><span style="color:#800000;">Count: {0}</span><span style="color:#800000;">"</span><span style="color:#000000;">, i);<br /></span><span style="color:#999999;">12</span> <span style="color:#000000;">}<br /></span><span style="color:#999999;">13</span> <span style="color:#000000;">}<br /></span><span style="color:#999999;">14</span> <span style="color:#000000;">}<br /></span><span style="color:#999999;">15</span> <span style="color:#000000;"></span><br />

The examples shown in this series use the C# 3.5 compiler, however I believe the format hasn't changed since C# 1.0.

PDB Header

PDB files start with a pretty large header of 32 bytes, which can be used to identify the file. The following dump shows a header of the resulting PDB file:

<span style="color:#000000;">00000000</span><span style="color:#000000;"> 4D </span><span style="color:#000000;">69</span><span style="color:#000000;"> </span><span style="color:#000000;">63</span><span style="color:#000000;"> </span><span style="color:#000000;">72</span><span style="color:#000000;"> 6F </span><span style="color:#000000;">73</span><span style="color:#000000;"> 6F </span><span style="color:#000000;">66</span><span style="color:#000000;"> </span><span style="color:#000000;">74</span><span style="color:#000000;"> </span><span style="color:#000000;">20</span><span style="color:#000000;"> </span><span style="color:#000000;">43</span><span style="color:#000000;"> 2F </span><span style="color:#000000;">43</span><span style="color:#000000;"> 2B 2B </span><span style="color:#000000;">20</span><span style="color:#000000;"> Microsoft C</span><span style="color:#000000;">/</span><span style="color:#000000;">C</span><span style="color:#000000;">++</span><span style="color:#000000;"><br /></span><span style="color:#000000;">00000010</span><span style="color:#000000;"> 4D </span><span style="color:#000000;">53</span><span style="color:#000000;"> </span><span style="color:#000000;">46</span><span style="color:#000000;"> </span><span style="color:#000000;">20</span><span style="color:#000000;"> </span><span style="color:#000000;">37</span><span style="color:#000000;"> 2E </span><span style="color:#000000;">30</span><span style="color:#000000;"> </span><span style="color:#000000;">30</span><span style="color:#000000;"> 0D 0A 1A </span><span style="color:#000000;">44</span><span style="color:#000000;"> </span><span style="color:#000000;">53</span><span style="color:#000000;"> </span><span style="color:#000000;">00</span><span style="color:#000000;"> </span><span style="color:#000000;">00</span><span style="color:#000000;"> </span><span style="color:#000000;">00</span><span style="color:#000000;"> MSF </span><span style="color:#000000;">7.00</span><span style="color:#000000;">...</span><span style="color:#000000;">DS</span><span style="color:#000000;">...</span><span style="color:#000000;"><br /></span><br />

The header has some interesting properties. First it is an ASCII string, which contains a line break and is zero terminated. This makes it possible to pass a PDB file to the DOS command 'type' and be able to see the version of the PDB file we have:

<span style="color:#000000;">D:</span><span style="color:#000000;">\</span><span style="color:#000000;">My Projects</span><span style="color:#000000;">\</span><span style="color:#000000;">Tests</span><span style="color:#000000;">\</span><span style="color:#000000;">CSharpConsoleBlog</span><span style="color:#000000;">\</span><span style="color:#000000;">CSharpConsoleBlog</span><span style="color:#000000;">></span><span style="color:#0000ff;">type</span><span style="color:#000000;"> Program10</span><span style="color:#000000;">.</span><span style="color:#000000;">pdb<br />Microsoft C</span><span style="color:#000000;">/</span><span style="color:#000000;">C</span><span style="color:#000000;">++</span><span style="color:#000000;"> MSF </span><span style="color:#000000;">7.00</span><span style="color:#000000;"><br /><br />D:</span><span style="color:#000000;">\</span><span style="color:#000000;">My Projects</span><span style="color:#000000;">\</span><span style="color:#000000;">Tests</span><span style="color:#000000;">\</span><span style="color:#000000;">CSharpConsoleBlog</span><span style="color:#000000;">\</span><span style="color:#000000;">CSharpConsoleBlog</span><span style="color:#000000;">></span><br />

You don't get the garbage displayed on the screen but still some valuable information from the file itself. The two letters DS in the header are the initials of Dan Spalding, who owned the linker and much of the PDB code for many years according to Andy Penell.

Next, at byte 0x00000020 starts a structure, which contains a lot of settings and provides a lot of information to a PDB reader:

FieldSizeMeaning
pageSize4The size of a page in the file.
bitmapPage4The page number of the bitmap page.
filePages4The number of pages in the file.
rootBytes4The number of bytes in the root stream.
reserved4Unused as far as I know.
indexPage4The page number of the index page.

Ok, so a PDB file is divided into fixed size pages (size in the pageSize field) and there's a bitmap that specifies if a page is in use or not. Sounds familiar? Well, yes it's the same strategy as used for the FAT file system, OLE compound files and in a lot of other areas.

The filePages field can be used to make sure the PDB file is completely available - multiply pageSize with filePages and you should have the size of the PDB file.

PDB Root Stream

Using the fields we have right now, we still can't unlock the contents of the PDB file. To get there we need to combine the rootBytes and indexPage fields. The indexPage field points to a page, which contains page numbers of the root stream. So it is an array of 4-byte page numbers, which hold the contents of the root stream in order. To determine the number of entries in the index, you must divide rootBytes by pageSize.

If you read all pages in order of the index, you've read the root stream. The root stream tells us, what is contained in the file. The root stream starts with 4 bytes, which tell the total number of streams in the file. An array of stream lengths is located after the stream count, e.g. each entry in this array corresponds to the length of that stream. What follows next is a page index for all streams, e.g. an array of page numbers for stream #1, an array of page numbers for stream #2 etc. The number of entries is again determined by the length of that stream divided by the page size.

The root stream gives us basically an index of all streams available and how they're spread accross the PDB file.

Starting with the next posting, I'll dive into the important streams in order.


No comments: