As I’ve been spending quite some time this weekend to refactor parts of the MOSA compiler and fixing things small and large. I’ve stumbled once again over our memory model. I was refactoring our internal representation in order to make load and store operations explicit and broke almost all of our tests at once. Fixing them was pretty easy, except for the smaller types... Section 12.1.2 states:
„Loading from 1- or 2-byte locations (arguments, locals, fields, statics, pointers) expands to 4-byte values.“
Ouch. We’ve gone through a lot of trouble to ensure correct arithmetics on all types and have been badly missing the point: All smaller integral types are handled at 4 bytes in size on the evaluation stack.
Next step was to change the CIL load instructions to correctly reflect this fact and fortunately we already had the appropriate instructions in the IR. So the current state of work is that most of our tests are passing again, but not all yet. Then I started wondering about the floating point specification. Looking at the section for floating point values (12.1.3), it states:
„The supported storage sizes are float32 and float64. Everywhere else (on the evaluation stack, as arguments, as return types, and as local variables) floating-point numbers are represented using an internal floating-point type. In each such instance, the nominal type of the variable or expression is either float32or float64, but its value can be represented internally with additional range and/or precision. The size of the internal floating- point representation is implementation-dependent, can vary, and shall have precision at least as great as that of the variable or expression being represented. An implicit widening conversion to the internal representation from float32 or float64 is performed when those types are loaded from storage. The internal representation is typically the native size for the hardware, or as required for efficient implementation of an operation.“
So for floating point types we have exactly one stack type F, but the implementation is free to choose the precision of its operations as long as it is at least as large as the storage size of the floating point type. Since we’ve spent a great deal of time on single precision arithmetics, I’m inclined to keep the reduced precision operations there. Any opinions?
I’ll continue fixing this in the next couple of days.
Labels: .NET, Compiler, MOSA
The tests are run by .NET calling through a function pointer delegate using the stdcall calling convention. This calling convention is similar to cdecl. One of the similarities is that the EBX register must be saved by the callee and restored before it returns to the caller. We didn’t do that and thus corrupted the state of the .NET runtime on Windows. Bug fixed and commit following soon.
Labels: .NET, Compiler, MOSA
After the static object allocation, I’ve finished the next step for MOSA. The compiler now emits mtable (virtual method tables) records for compiled types and is able to properly call virtual functions. The test to check these is in CallVirtFixture. I’ll add a couple more tests there to check for proper hiding, base class calls and other things - hopefully the current code should handle all of those cases sufficiently well.
In order to accomplish this I’ve had to add a fake System.Object implementation to the existing tests, as those classes wouldn’t compile anymore - the linker couldn’t create the vtable for them due to the 4 virtual methods every object inherits from System.Object: ToString, GetHashCode, Equals and Finalize.
The good thing about this is of course we can now use virtual functions and use overrides to do OO-kernels, the down side of course is: Every kernel has to provide at least a fake implementation of System.Object.
I’ve added a fake System.Object to the existing HelloWorld kernel.
Let’s see what Phil and Simon can come up with in Hello World, now that this is out of the way.
Labels: .NET, Compiler, MOSA
After improving our test situation on the weekend, I’ve started on making good a promise I’ve given a long time ago. The promise was: Allocate static objects at compile time for core kernel services.
The issue with writing a managed operating system or any operating system is memory management and moving to the OS way of working at boot time. The core problem is that there are assumptions about objects, which can’t be met easily as memory management is being initialized on the CPU(s) the OS will run on. MOSA is facing this problem too, with one addition: Writing non-OO code in an OO language feels broken.
Classical operating systems solve this by having a reduced set of services while booting and initializing the OS services later in the OS specific fashion. We could’ve done this too, but why go for common ground if there’s new to explore.
So what does this feature do? This feature detects all dynamic memory allocations happening in static constructors and allocates memory for the allocated objects at compile time in the bss segment of an executable. It replaces the call to new with a load of the address of the data segment location, making the position of the object fixed in memory relative to its load address. This allows core OS services to be written using C# classes right from the start and allows them to be used as such.
There are some limitations though: The allocated object must be fixed in size, it must not have a complex structure and the field used to store the object must have the exact same type as used for the new operator. No casts allowed.
There’re probably further limits to this feature, but I haven’t figured them out yet. It will certainly be interesting to explore our HelloWorld kernel with OO-features using the new MOSA compiler.
Oh and before I forget: The feature must be turned on explicitly on the command line. Use --enable-static-alloc (or the shorter --sa) to enable it.
I’ve update the HelloWorld projects CMOS and Boot classes to take advantage of this mechanism.
Labels: .NET, Compiler, MOSA
Over the past couple of days I’ve made changes to MOSA to support generics. The first and primary change was to support a scheduled compilation model.
Let me explain this: Up to these changes the MOSA compiler would just scan an assembly, locate all types and compile each method contained in them. The only exceptions to this rule were native methods, generic types and generic methods.
The revised compilation scheduler stageHowever skipping generics doesn’t work anymore - you want those generic types and methods compiled too. The issue you face is how to compile these methods and types without knowing the usage. So I basically kept the current compilation scheduler, but added the capability for the pipeline to schedule additional types and methods in the scheduler. In order to do this, there’s a new assembly compilation stage: The ICompilationSchedulerStage. This stage performs the type lookups that used to be done by the MethodCompilerBuilderStage.
The compilation scheduler now maintains a schedule of methods and types to compile and executes these in order. Once all scheduled methods and types are compiled, the entire assembly including all of its generic usages has been compiled.
Labels: .NET, Compiler, MOSA
After about 3 hours patching and fixing the last issues I finally completed the first successful test of generics in MOSA. The following C# fragment compiles successfully and passes all tests:
static class Test
{
private static T GenericMethod<T>(T value)
{
return value;
}
public static bool TestCallGenericMethodWith(int value)
{
return value == GenericMethod(value);
}
}
It may not look like much, but this test is the ground for all other generic arguments and has moved forward not only the compiler, but also the assembly loader, the MOSA runtime and some other additions like cleaner error messages from mosacl.
I’ll finish this test case tomorrow and push my changes to Github, I’ll probably need some help with that from fellow MOSA contributors.
Good night.
Labels: .NET, Compiler, MOSA
In the last post, I wrote about the debug symbol formats used by Microsoft in recent years. This post is dedicated to tell you, where to look about accessing these files using official APIs. If the world consisted only of Windows, we could stop here. We wouldn't need to understand the file format itself or be able to read the files without the APIs described below. However the world isn't living in monoculture so I'll keep my goal to describe the PDB format in the next posts.
Essentially Microsoft only makes four APIs available to access debugging symbols:
and the only remaining API is in the .NET System.Diagnostics.SymbolStore namespace in mscorlib.
Of all of these APIs the Image Helper Library provides the most features, followed by the Debug Help Library. The later is used by the Microsoft Debuggers to load symbol information. While both of these libraries are regular Win32 DLLs with WINAPI entry points, the Debug Interface Access SDK provides COM objects to access the contents of symbol files. The library is very easy to work with.
As one can easily see these three options don't work outside the Microsoft world, well they don't except for maybe Wine or ReactOS. My first hope was that using the System.Diagnostics.SymbolStore namespace would be sufficient for our purpose of retrieving the symbol information, but again this quickly makes the code a Windows only option.
From Mike Stall's .NET Debugging Blog, I've taken the following snippet from the sample code of PDB2XML tool - a tool, which uses ISymbolReader to read a PDB file and writes it in an XML file.
// We demand Unmanaged code permissions because we're reading from the file system and calling out to the Symbol Reader
// @TODO - make this more specific.
[System.Security.Permissions.SecurityPermission(System.Security.Permissions.SecurityAction.Demand,
Flags = System.Security.Permissions.SecurityPermissionFlag.UnmanagedCode)]
public static ISymbolReader GetSymbolReaderForFile(SymbolBinder binder, string pathModule, string searchPath)
{
// Guids for imported metadata interfaces.
Guid dispenserClassID = new Guid(0xe5cb7a31, 0x7512, 0x11d2, 0x89, 0xce, 0x00, 0x80, 0xc7, 0x92, 0xe5, 0xd8); // CLSID_CorMetaDataDispenser
Guid dispenserIID = new Guid(0x809c652e, 0x7396, 0x11d2, 0x97, 0x71, 0x00, 0xa0, 0xc9, 0xb4, 0xd5, 0x0c); // IID_IMetaDataDispenser
Guid importerIID = new Guid(0x7dac8207, 0xd3ae, 0x4c75, 0x9b, 0x67, 0x92, 0x80, 0x1a, 0x49, 0x7d, 0x44); // IID_IMetaDataImport
// First create the Metadata dispenser.
object objDispenser;
NativeMethods.CoCreateInstance(ref dispenserClassID, null, 1, ref dispenserIID, out objDispenser);
// Now open an Importer on the given filename. We'll end up passing this importer straight
// through to the Binder.
object objImporter;
IMetaDataDispenser dispenser = (IMetaDataDispenser)objDispenser;
dispenser.OpenScope(pathModule, 0, ref importerIID, out objImporter);
IntPtr importerPtr = IntPtr.Zero;
ISymbolReader reader;
try
{
// This will manually AddRef the underlying object, so we need to be very careful to Release it.
importerPtr = Marshal.GetComInterfaceForObject(objImporter, typeof(IMetadataImport));
reader = binder.GetReader(importerPtr, pathModule, searchPath);
}
finally
{
if (importerPtr != IntPtr.Zero)
{
Marshal.Release(importerPtr);
}
}
return reader;
}
Ouch, that was a lot of code just to get the symbol reader - but wait a minute, what are the Guids doing there and the CoCreateInstance call? This screams trouble for cross platform code... It turns out that ISymbolReader is not useful without an object, which implements the IMetadataImport interface. This is a COM interface implemented by mscoree.dll, the Microsoft .NET Runtime Execution Engine. And you can't get an ISymbolReader without a SymBinder, which is not even defined in the namespace. These GUIDs and the COM classes are defined in ISymWrapper.dll, a COM interop assembly.
But not enough, the ISymbolBinder1 (don't ask, look up ISymbolBinder and ISymbolBinder1 and figure out the reason for ISymbolBinder1) interface uses an IntPtr to access this IMetadataImport interface. Essentially you're passing a COM interface in an unmanaged pointer (or native int in CIL speak) to another unmanaged COM object.
Somehow this is messed up. Really messed up. It looks like symbol information was an afterthought in the development of .NET and hasn't received any priority ever since .NET 1.0 - this mess has been this way since .NET 1.0 was released. I hope that things get better with .NET 4.0, but for some reason I doubt that.
Now we've collected lots of unusable APIs and we still can't read those PDB files anywhere outside of Windows. However this gives us something else: All of those APIs have documented some structures to pass symbol information to the calling application. These structures are very likely to be similar to what is stored on disk - at least these APIs give us some hints that the format is more complex than one might think. And finally the .NET namespace gives us some design guidelines to realize a PDB reader/writer using plain .NET.
More in the next post.
Labels: .NET, Compiler, Debugging, MOSA
This continues the series of Debug Symbols for MOSA, started in
Why we need them.
As the title says, there's a multitude of debug symbol formats out there. It usually depends on the operating system, the compiler, compiler switches and linker of choice what kind of symbols are emitted or even worse if any at all. I'm only going to talk about the Microsoft formats, as these are the ones I've actively been working with.
Microsoft has created a multitude of symbol formats in the past, where even PDB files exist in multiple formats. This post sheds some light, which debug symbols are out there.
The general history for Microsoft Symbol Formats is that there are mainly three kinds:
I will not dive into the Pre-CodeView era mainly because I don't have much knowledge about it.
So why does Microsoft change the symbol format all the time? The
answer was given by Matt Pietrek. As the compilers and debuggers advanced more information was stored in the file. Some changes were performed due to the 16->32 bit transition, but most changes can be attributed to advances in the debugger. Edit and Continue is an example of this.
CodeViewCodeView is format developed by Microsoft sometime along with the CodeView debugger, which was later integrated into the Microsoft C to become the Visual Studio we know today. There are several revisions of CodeView, which adopt the format to the specific compiler version in use.
There's even a public specification for the CodeView format available various
places on the internet.
The CodeView format was stored in various containers (files) over the years, namely the *.dbg files upto Windows 2000 and it is still in use today in the *.pdb files emitted by Visual Studio compilers since around 1997 and more importantly for MOSA: It is also emitted by the .NET compilers.
PDBPDB files are in use for quite some time now, but even this file format has went through at least three transitions. There's at least one format for managed symbols produced by csc, vbc and the other .NET compilers - yes another format. Again.
CILDBMicrosoft has submitted the Common Language Infrastructure to the ECMA for standardization. The latest standardized edition I'm aware of is ISO/IEC 23271, published on 2006-10-01. Partition V of this standardization defines a Debug Interchange Format, specifically called CILDB. The specification is available for
download.
The specification introduction says this:
Portable CILDB files provide a standard way to interchange debugging information between CLI producers and consumers. This partition serves to fill in gaps not covered by metadata, notably the names of local variables and source line correspondences.
Even though Microsoft has pushed this format as part of the specification, no Microsoft tool included with the .NET Framework SDK, the Framework itself or Visual Studio is able to generate or consume these files. So the interchange aspect of this standard is not realized. There are both open and closed source apps, that are able to convert Microsoft PDB files to CILDB - all with a drawback I'll talk about in the next post.
More in the next post. However one last format for managed code remains:
MDBThis is the mono debug format - it is used by mdb and MonoDevelop. There's integrated support in mono using a SymbolWriter/SymbolReader to produce and consume these files from managed code. Talk about fun!
The MDB option is definitely one we should follow to debug applications on MOSA, but it is not one we are able to use for kernel debugging or debugging native code.
Basically this means that mosacl (our ahead of time compiler) must be able to read PDB, MDB and CILDB files in order to map the source code to appropriate places in the native code. Again - more in another post.
Labels: .NET, Compiler, Debugging, MOSA
The MOSA compiler is nearing its 0.1 alpha release and one of the things that has been bugging me since the start was creating debug symbols for compiled assemblies. The
MOSA compiler converts CIL assemblies to native code for a specific target architecture. In the process however the mapping of source code to native code gets lost, unless the compiler is able to create new symbol information to map the native code back to the managed source code.
There are various reasons that support is needed for symbols, one of them is that it makes kernel debugging a whole lot easier if the debugger allows stepping in the source code and variable inspection (the Visual Studio experience.) The other point is that various .NET APIs allow creating debug symbols (CodeDOM or Reflection.Emit) or inspecting them using the
System.Diagnostics.SymbolStore namespace.
As I'll explain in the posts later this is no easy task, but one that'll raise the productivity of kernel development a lot.
Labels: .NET, Compiler, Debugging, MOSA