MSDN Magazine - February 2008 - (Page 69) SDK tools you can use to compile text IL into .NET assemblies and decompile assemblies out The CLR’s abstract stack performs operations to IL, respectively. ILASM allows for quick and on more than just integers. It has a rich type syseasy testing of IL instruction streams that will tem, including strings, integers, booleans, floats, be the basis of the compiler output. You simply doubles, and so on. In order for my language to create the test IL code in a text editor and feed it run safely on the CLR and interoperate with other in to ILASM. Meanwhile, the ILDASM tool can .NET-compliant languages, I incorporate some quickly peek at the IL a compiler has generated of the CLR type system into my own program. for a particular code path. This includes the IL Specifically, the Good for Nothing language de- Figure 8 Reflection.Emit Libraries that commercial compilers emit, such as the C# fines two types—numbers and strings—which I Used to Build a .NET Assembly compiler. It offers a great way to see the IL code map to System.Int32 and System.String. for statements that are similar between languages; in other words, The Good for Nothing compiler makes use of a Base Class Library the IL flow control code generated for a C# for loop could be re(BCL) component called System.Reflection.Emit to deal with IL used by other compilers that have similar constructs. code generation and .NET assembly creation and packaging. It’s a low-level library, which sticks close to the bare metal by provid- The Code Generator ing simple code-generation abstractions over the IL language. The code generator for the Good for Nothing compiler relies The library is also used in other well-known BCL APIs, including heavily on the Reflection.Emit library to produce an executable System.Xml.XmlSerializer. .NET assembly. I will describe and analyze the important parts of The high-level classes that are required to create a .NET assem- the class; the other bits are left for you to peruse at your leisure. bly (shown in Figure 8) somewhat follow the builder software deThe CodeGen constructor, which is shown in Figure 9, sets up the sign pattern, with builder APIs for each logical .NET metadata Reflection.Emit infrastructure, which is required before I can start abstraction. The AssemblyBuilder class is used to create the PE emitting code. I begin by defining the assembly name and passfile and set up the necessary .NET assembly metadata elements ing that to the assembly builder. In this example, I use the source like the manifest. The ModuleBuilder class is used to create mod- file name as the assembly name. Next is the ModuleBuilder defiules within the assembly. TypeBuilder is used to create Types and their associated metadata. MethodBuilder and LocalBuilder deal Figure 9 CodeGen Constructor with adding methods to types and locals to methods, respectively. Emit.ILGenerator il = null; The ILGenerator class is used to generate the IL code for methods, Collections.Dictionary symbolTable; utilizing the OpCodes class, which is a big enumeration containpublic CodeGen(Stmt stmt, string moduleName) ing all possible IL instructions. All of these Reflection.Emit classes { are used in the Good for Nothing code generator. if (Path.GetFileName(moduleName) != moduleName) msdn2.microsoft.com/aa569283). { } Association (ECMA) specification (available at Tools for Getting Your IL Right throw new Exception(“can only output into current directory!”); Even the most seasoned compiler hackers make mistakes at the code-generation level. The most common bug is bad IL code, which causes unbalance in the stack. The CLR will typically throw an exception when bad IL is found (either when the assembly is loaded or when the IL is JITed, depending on the trust level of the assembly). Diagnosing and repairing these errors is simple with an SDK tool called peverify.exe. It performs a verification of the IL, making sure the code is correct and safe to execute. For example, here is some IL code that attempts to add the number 10 to the string “bad”: ldc.i4 ldstr add 10 “bad” AssemblyName name = new AssemblyName(Path.GetFileNameWithoutExtension(moduleName)); Emit.AssemblyBuilder asmb = AppDomain.CurrentDomain.DefineDynamicAssembly(name, Emit.AssemblyBuilderAccess.Save); Emit.ModuleBuilder modb = asmb.DefineDynamicModule(moduleName); Emit.TypeBuilder typeBuilder = modb.DefineType(“Foo”); Emit.MethodBuilder methb = typeBuilder.DefineMethod(“Main”, Reflect.MethodAttributes.Static, typeof(void), System.Type.EmptyTypes); // CodeGenerator this.il = methb.GetILGenerator(); this.symbolTable = new Dictionary (); // Go Compile this.GenStmt(stmt); il.Emit(Emit.OpCodes.Ret); typeBuilder.CreateType(); modb.CreateGlobalFunctions(); asmb.SetEntryPoint(methb); asmb.Save(moduleName); this.symbolTable = null; this.il = null; Running peverify over an assembly that contains this bad IL will result in the following error: [IL]: Error: [C:\MSDNMagazine\Sample.exe : Sample::Main][offset 0x0000002][found ref ‘System .String’] Expected numeric type on the stack. In this example, peverify reports that the add instruction expected two numeric types where it instead found an integer and a string. ILASM (IL assembler) and ILDASM (IL disassembler) are two } .NET Compiler february2008 69 http://msdn2.microsoft.com/aa569283
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.