MSDN Magazine - February 2008 - (Page 63) ductivity (such as Python, which aims to get the most out of every line of code), to specialization (such as Verilog, which is a hardware description language used by processor manufacturers), to simply satisfying the author’s personal preferences. (The creator of Boo, for instance, likes the .NET Framework but is not happy with any of the available languages.) Once you’ve stated the purpose, you can design the language— think of this as the blueprint for the language. Computer languages must be very precise so that the programmer can accurately express exactly what is required and so that the compiler can accurately understand and generate executable code for exactly what’s expressed. The blueprint of a language must be specified to remove ambiguity during a compiler’s implementation. For this, you use a metasyntax, which is a syntax used to describe the syntax of languages. There are quite a few metasyntaxes around, so you can choose one according to your personal taste. I will specify the Good for Nothing language using a metasyntax called EBNF (Extended Backus-Naur Form). It’s worth mentioning that EBNF has very reputable roots: it’s linked to John Backus, winner of the Turing Award and lead developer on FORTRAN. A deep discussion of EBNF is beyond the scope of the article, but I can explain the basic concepts. The language definition for Good for Nothing is shown in Figure 1. According to my language definition, Statement (stmt) can be variable declarations, assignments, for loops, reading of integers from the command line, or printing to the screen—and they can be specified many times, separated by semicolons. Expressions (expr) can be strings, integers, arithmetic expressions, or identifiers. Identifiers (ident) can be named using an alphabetic character as the first letter, followed by characters or numbers. And so on. Quite simply, I’ve defined a language syntax that provides for basic arithmetic capabilities, a small type system, and simple, consolebased user interaction. You might have noticed that this language definition is short on specificity. I haven’t specified how big the number can be (such as if it can be bigger than a 32-bit integer) or even if the number can be negative. A true EBNF definition would precisely define these details, but, for the sake of conciseness, I will keep my example here simple. Here is a sample Good for Nothing language program: var ntimes = 0; print “How much do you love this company? (1-10) “; read_int ntimes; var x = 0; for x = 0 to ntimes do print “Developers!”; end; print “Who said sit down?!!!!!”; Figure 1 Good for Nothing Language Definition := | | | | | := | | | var = = for = to do end read_int print ; := := + | - | * | / := * := | := + := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 := “ * “ := Good for Nothing language and translate it into something that the .NET CLR can execute. A compiler achieves this through a series of translation steps, breaking down the language into parts that we care about and throwing away the rest. Compilers follow common software design principles—loosely coupled components, called phases, plugged together to perform translation steps. Figure 2 illustrates the components that perform the phases A compiler’s job is to of a compiler: the scantranslate high-level tasks ner, parser, and code gencreated by the programmer erator. In each phase, the language is broken down into tasks that a computer further, and that informaprocessor can understand tion about the program’s and execute. intention is served to the next phase. Compiler geeks often abstractly group the phases into front end and back end. The front end consists of scanning and parsing, while the back end typically consists of code generation. The front end’s job is to discover the syntactic structure of a program and translate that from text into a high-level in-memory representation called an Abstract Syntax Tree (AST), which I will elaborate on shortly. The back end has the task of taking the AST and converting it into something that can be executed by a machine. The three phases are usually divided into a front end and a back You can compare this simple program with the language definition to get a better understanding of how the grammar works. And with that, the language definition is done. High-Level Architecture A compiler’s job is to translate high-level tasks created by the programmer into tasks that a computer processor can understand and execute. In other words, it will take a program written in the Figure 2 The Compiler’s Phases february2008 63
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.