Simple-MSIL-Compiler

This is a version of the Orange C compiler that does MSIL code generation for the .NET Framework. This is a WIP. At present it mostly supports the C language.

This version supports common RTL variables such as stdin, stdout, stderr, errno, and the variables used for the macros in ctype.h. It also supports command line arguments.

This version supports marshalling of function pointers. A small helper dll called occmsil.dll is involved in creating thunks for this. This helper dll is built when you build the compiler.

Calling unprototyped functions now results in an error.

The results are undefined if you try to use some extension such as alloca.

There may be a variety of bugs.

The sources for this version are build as part of the main Orange C branch and is build as part of the Main Orange C CI to build an installation setup file after each checkin.

Run the compiler occil on a simple C program (test.c is included as an example).

occil automatically loads symbols from mscorlib.

Additions to the language to support .NET

additions to code generation to support disassembly of output files

OCC can try to generate code visible to a disassembler, e.g. DotPeek. At this point ILSpy isn't supported do to some missing language features.

There are four new command line switches to control this behavior

/C+f generates 'fixed' statements that are required for CSC to regenerate code that takes the address of various managed constructs /C+s inline string constants with LDSTR when possible, for readability /C+a generates Delegate references instead of attempting to handle function pointers natively /C+I generates causes uninitialized scalars and pointers to be initialized to zero.

All of these will probably be performance hits. Additionally, the way /C+s is implemented, it might leak memory e.g. in loops.

The code generated by the compiler may require runtime dlls to be used as references in the recompile, e.g. for the CSC recompile use /reference:lsmsilcrtl.dll. Also, when recompiling with csc you need /unsafe and /platform:x86

there are known issues where some C constructs cannot be translated to C#. The main one is that if you nest case statements inside otheher controls statements, the gotos generated will not be compilable by CSC. You need to rearrange the input code to not do that. Possibly the code in question is some sort of state machine; one solution is to add new states prior to compiling the code with occil.

Implementation Notes

This compiler will generate either a .EXE or .DLL file, or alternately a .il file suitable for viewing or compiling with ilasm. Additionally, the compiler is capable of generating object files in the familiar object-file-per-module paradigm that can be linked with a linker called netlink. This linker is also part of the package. The compiler uses an independent library dotnetpelib to create the output.

MSIL Limitations

1) Extended precision found in long double type is missing in MSIL - long double is synonomous with double. 2) You can't put variable length argument lists on variables which are pointers to functions and then pass them to unmanaged code. You can however use variable length argument lists on pointers to functions if you keep them managed. 3) Initialization of static non-const variables must be done programmatically rather than 'in place' the way a normal C compiler does it - so there are initialization functions generated for each module. This impacts startup performance.

This compiler will compile either an EXE or a DLL. The package generally defaults to compiling everything into the unnamed MSIL namespace, however, for interoperability with C# it is necessary to wrap the code into a namespace and an outer class. A comm and line switch conveniently specifies this wrapper.

This compiler is capable of auto-detecting DLL entry points for unmanaged DLLs, so for example you can specify on the command line that the compiler should additionally import from things like kernel32.dll and/or user32.dll. This still requires header support so that prototypes can be specified correctly however. This compiler is designed to work with the same headers that Orange C for the x86 uses.

This compiler is capable of importing static functions from .NET assemblies.

By default the compiler automatically imports the entry points for msvcrt.dll, and the occmsil.dll used for function pointer thunking. A .NET assembly lsmsilcrtl is used for runtime support - mostly it performs malloc and free in managed code and exports some functions useful for handling variable length argument lists and command lines. mscorlib is also automatically loaded and its static functions are available for use.

It is possible to have the compiler combine multiple files into a single output; in this way it performs as a psuedo-linker as well. Simply specify all the input files on the command line. The compiler takes wildcards on the command line so you can do something like this for example to compile all the files, linking against several Win32 DLLs, and giving it an outer namespace and class to be able to reference from C#:

occil /omyoutputfile *.c /Wd /Lkernel32;user32;gdi32 /Nmynamespace.myclass

The /Wd switch means make a DLL. /Wg means Windows GUI. /Wc (the default) means Windows console. Adding l on the end of a /W switch (e.g. /Wcl) means load lscrtlil.dll as the unmanaged runtime, instead of msvcrt.dll. You might want to do this to get access to C99 and C11 functions.

The compiler will create structures and enumerations for things found in the C code, that can be used from C#. Unlike older versions, in this version pointers are mostly typed instead of being pointers to void.

This compiler will also enable C# to call managed functions with variable length argument lists.

The /L switch may now also be used to specify .NET assemblies to load. mscorlib.dll is automatically loaded by the compiler.

The switch '-d' can be used to tell the compiler to stick to plain C and not accept C#-like extensions.

Beyond that this is a C11 compiler, but some things currently aren't implemented or are implemented in a limited fashion:

1) Complex numbers aren't implemented. 2) Atomics aren't implemented. 3) Thread and thread local storage aren't implemented. 4) Runtime library is msvcrtl.dll, and doesn't support C11 or C99 additions to the C RTL. 5) Arrays aren't implemented as managed arrays but as pointers to unmanaged memory. 6) Array types are actually implemented as .NET classes. 7) Variable length argument lists are done in the C# style rather than in the C style - except during calls to unmanaged functions. 8) Variable length argument lists get marshalling performed when being passed to unmanaged code, but this only handles simple types. 9) Thunks are generated for pointers-to-functions passed between managed and unmanaged code (e.g. for qsort and for WNDPROC style functions) but when the pointers are placed in a structure you need to give the compiler a hint. Use CALLBACK in the f unction pointer definition and make the callback a stdcall function. 10) In the thunks for the transition from unmanaged to managed code used by function pointers passed to unmanaged code marshalling is performed, but this only handles simple types. 11) Variable length arrays and alloca are implemented with a managed memory allocator instead of with the localalloc MSIL instruction. 12) Structures passed by value to functions get copied to temporary variables before the call. 13) Many compiler optimizations found in the native version of the compiler are currently turned off. 14) The compiler will not allow use of unprototyped functions.