15 January 2009

Debug Symbols for MOSA #2 - Debug Symbol Formats

This continues the series of Debug Symbols for MOSA, started in Why we need them.

As the title says, there's a multitude of debug symbol formats out there. It usually depends on the operating system, the compiler, compiler switches and linker of choice what kind of symbols are emitted or even worse if any at all. I'm only going to talk about the Microsoft formats, as these are the ones I've actively been working with.

Microsoft has created a multitude of symbol formats in the past, where even PDB files exist in multiple formats. This post sheds some light, which debug symbols are out there.

The general history for Microsoft Symbol Formats is that there are mainly three kinds:
  • Pre-CodeView
  • CodeView
  • PDB
I will not dive into the Pre-CodeView era mainly because I don't have much knowledge about it.

So why does Microsoft change the symbol format all the time? The answer was given by Matt Pietrek. As the compilers and debuggers advanced more information was stored in the file. Some changes were performed due to the 16->32 bit transition, but most changes can be attributed to advances in the debugger. Edit and Continue is an example of this.

CodeView

CodeView is format developed by Microsoft sometime along with the CodeView debugger, which was later integrated into the Microsoft C to become the Visual Studio we know today. There are several revisions of CodeView, which adopt the format to the specific compiler version in use.

There's even a public specification for the CodeView format available various places on the internet.

The CodeView format was stored in various containers (files) over the years, namely the *.dbg files upto Windows 2000 and it is still in use today in the *.pdb files emitted by Visual Studio compilers since around 1997 and more importantly for MOSA: It is also emitted by the .NET compilers.

PDB

PDB files are in use for quite some time now, but even this file format has went through at least three transitions. There's at least one format for managed symbols produced by csc, vbc and the other .NET compilers - yes another format. Again.

CILDB

Microsoft has submitted the Common Language Infrastructure to the ECMA for standardization. The latest standardized edition I'm aware of is ISO/IEC 23271, published on 2006-10-01. Partition V of this standardization defines a Debug Interchange Format, specifically called CILDB. The specification is available for download.

The specification introduction says this:
Portable CILDB files provide a standard way to interchange debugging information between CLI producers and consumers. This partition serves to fill in gaps not covered by metadata, notably the names of local variables and source line correspondences.

Even though Microsoft has pushed this format as part of the specification, no Microsoft tool included with the .NET Framework SDK, the Framework itself or Visual Studio is able to generate or consume these files. So the interchange aspect of this standard is not realized. There are both open and closed source apps, that are able to convert Microsoft PDB files to CILDB - all with a drawback I'll talk about in the next post.

More in the next post. However one last format for managed code remains:

MDB

This is the mono debug format - it is used by mdb and MonoDevelop. There's integrated support in mono using a SymbolWriter/SymbolReader to produce and consume these files from managed code. Talk about fun!

The MDB option is definitely one we should follow to debug applications on MOSA, but it is not one we are able to use for kernel debugging or debugging native code.

Basically this means that mosacl (our ahead of time compiler) must be able to read PDB, MDB and CILDB files in order to map the source code to appropriate places in the native code. Again - more in another post.

No comments: