16 January 2009

Debug Symbols for MOSA #3 - Accessing PDB files

In the last post, I wrote about the debug symbol formats used by Microsoft in recent years. This post is dedicated to tell you, where to look about accessing these files using official APIs. If the world consisted only of Windows, we could stop here. We wouldn't need to understand the file format itself or be able to read the files without the APIs described below. However the world isn't living in monoculture so I'll keep my goal to describe the PDB format in the next posts.

Essentially Microsoft only makes four APIs available to access debugging symbols:

and the only remaining API is in the .NET System.Diagnostics.SymbolStore namespace in mscorlib.

Of all of these APIs the Image Helper Library provides the most features, followed by the Debug Help Library. The later is used by the Microsoft Debuggers to load symbol information. While both of these libraries are regular Win32 DLLs with WINAPI entry points, the Debug Interface Access SDK provides COM objects to access the contents of symbol files. The library is very easy to work with.

As one can easily see these three options don't work outside the Microsoft world, well they don't except for maybe Wine or ReactOS. My first hope was that using the System.Diagnostics.SymbolStore namespace would be sufficient for our purpose of retrieving the symbol information, but again this quickly makes the code a Windows only option.

From Mike Stall's .NET Debugging Blog, I've taken the following snippet from the sample code of PDB2XML tool - a tool, which uses ISymbolReader to read a PDB file and writes it in an XML file.

        // We demand Unmanaged code permissions because we're reading from the file system and calling out to the Symbol Reader
// @TODO - make this more specific.
Flags = System.Security.Permissions.
public static ISymbolReader GetSymbolReaderForFile(SymbolBinder binder, string pathModule, string searchPath)
// Guids for imported metadata interfaces.
Guid dispenserClassID = new Guid(0xe5cb7a31, 0x7512, 0x11d2, 0x89, 0xce, 0x00, 0x80, 0xc7, 0x92, 0xe5, 0xd8); // CLSID_CorMetaDataDispenser
Guid dispenserIID = new Guid(0x809c652e, 0x7396, 0x11d2, 0x97, 0x71, 0x00, 0xa0, 0xc9, 0xb4, 0xd5, 0x0c); // IID_IMetaDataDispenser
Guid importerIID = new Guid(0x7dac8207, 0xd3ae, 0x4c75, 0x9b, 0x67, 0x92, 0x80, 0x1a, 0x49, 0x7d, 0x44); // IID_IMetaDataImport

// First create the Metadata dispenser.
object objDispenser;
NativeMethods.CoCreateInstance(ref dispenserClassID, null, 1, ref dispenserIID, out objDispenser);

// Now open an Importer on the given filename. We'll end up passing this importer straight
// through to the Binder.
object objImporter;
IMetaDataDispenser dispenser = (IMetaDataDispenser)objDispenser;
dispenser.OpenScope(pathModule, 0,
ref importerIID, out objImporter);

IntPtr importerPtr = IntPtr.Zero;
ISymbolReader reader;
// This will manually AddRef the underlying object, so we need to be very careful to Release it.
importerPtr = Marshal.GetComInterfaceForObject(objImporter, typeof(IMetadataImport));

reader = binder.GetReader(importerPtr, pathModule, searchPath);
if (importerPtr != IntPtr.Zero)
return reader;

Ouch, that was a lot of code just to get the symbol reader - but wait a minute, what are the Guids doing there and the CoCreateInstance call? This screams trouble for cross platform code... It turns out that ISymbolReader is not useful without an object, which implements the IMetadataImport interface. This is a COM interface implemented by mscoree.dll, the Microsoft .NET Runtime Execution Engine. And you can't get an ISymbolReader without a SymBinder, which is not even defined in the namespace. These GUIDs and the COM classes are defined in ISymWrapper.dll, a COM interop assembly.

But not enough, the ISymbolBinder1 (don't ask, look up ISymbolBinder and ISymbolBinder1 and figure out the reason for ISymbolBinder1) interface uses an IntPtr to access this IMetadataImport interface. Essentially you're passing a COM interface in an unmanaged pointer (or native int in CIL speak) to another unmanaged COM object.

Somehow this is messed up. Really messed up. It looks like symbol information was an afterthought in the development of .NET and hasn't received any priority ever since .NET 1.0 - this mess has been this way since .NET 1.0 was released. I hope that things get better with .NET 4.0, but for some reason I doubt that.

Now we've collected lots of unusable APIs and we still can't read those PDB files anywhere outside of Windows. However this gives us something else: All of those APIs have documented some structures to pass symbol information to the calling application. These structures are very likely to be similar to what is stored on disk - at least these APIs give us some hints that the format is more complex than one might think. And finally the .NET namespace gives us some design guidelines to realize a PDB reader/writer using plain .NET.

More in the next post.

No comments: