A journey of 603 pages...

A journey of 603 pages...

What is a language "implementation"?

·

3 min read

That feeling of "I have no idea what this is" is exciting for me; It's what led me to pick up Crafting Interpreters by Robert Nystrom.

I already have a decent understanding of what an interpreter is, what it does, why we need it, and how it differs from a compiler.

It turns out that the differences between interpreted and compiled languages are somewhat fuzzy. I might dive into this topic in a later post...

What's more important to understand is how languages are implemented.

That is, how do we take a language's human-readable code and convert it into something a CPU can understand?

The Map of a Language Implementation

In the first two chapters of his book, Robert took me through the high-level components of a language's implementation.

You've got the lexer which takes in the raw text of the language and tokenizes it. These tokens are like the individual words and punctuation of a natural language.

Then the parser takes these tokens and builds an abstract syntax tree that captures how the tokens work together to do something (i.e. the behavior of the program). Essentially, the parser structures the tokens according to the grammar of the language.

Next comes static analysis. This is where we take the structure provided by the parser, resolve names in the code to their target values, perform type checks if necessary, and finally store this context somewhere so that it can be used in later steps.

Once static analysis is complete, we need to represent the code in a way that is agnostic about the CPU architecture it will run on. We need an intermediate representation (IR).

This way if I write an interpreter for my language, I can run any code written in that language on x86, ARM, or any other architecture.

With our code expressed as an IR, we can apply optimizations to it. For example, we can use constant folding to find calculations that are built on static values and replace them with the resulting value. This saves the program from needing to re-calculate the value each time it runs.

Finally, we generate code that can be read by the CPU (or a virtual one in the case of a VM) and implement a runtime that includes features like garbage collection, type checking, and exception handling.

Great! So now I know all of the high-level components I need to build into the interpreter.

So, what am I building exactly?

With the help of Nystrom's book, I'll build an interpreter for a fictitious language called Lox. To build this interpreter, I'll use the Rust programming language!

But before I start, I need to first understand the intended structure, syntax, and grammar of Lox.

In the next installment, I'll share what I think of Lox and what I expect the challenges will be when building the interpreter for it.

Until then!