I started reading Crafting Interpreters by Bob Nystrom a few days ago. It is really a good start if you are looking for a practical introduction to compilers. It takes you through design and implementation of a tiny language Lox.
Bob Nystrom uses Java to implement the interpreter. Though I am comfortable with Java, I wanted to implement the interpreter in Python in parallel and use his Java code to verify my implementation. This helped me to understand the concepts better. As you may have expected, Python version was more concise than the corresponding Java implementation.
Code Generation Script
It is a common practice to use scripts to generate repetitive code that would be a tedious job to write otherwise. in 5th Chapter of the book, Bob follows the same approach to generate java classes for representing expression types. He uses a script to generate class definitions from class name and a list of member field names and types.
defineAst(outputDir, "Expr", Arrays.asList(
"Binary : Expr left, Token operator, Expr right",
"Grouping : Expr expression",
"Literal : Object value",
"Unary : Token operator, Expr right"
));
The above snippet generates
abstract class Expr {
static class Binary extends Expr {
Binary(Expr left, Token operator, Expr right) {
this.left = left;
this.operator = operator;
this.right = right;
}
@Override
<R> R accept(Visitor<R> visitor) {
return visitor.visitBinaryExpr(this);
}
final Expr left;
final Token operator;
final Expr right;
}
// rest of the class definitions
}
Use of code generation scripts introduces additional steps to the compilation process.
- compile the code generation script
- run the script to generate code
- compile the application code including generated code
Code generation with Metaprogramming
Lox’s python implementations listed at Project Wiki
follows the same approach to generate the code. An alternative way is to use
Metaprogramming.
Metaprogramming is a programming technique in which a piece of code manipulates
some other code. Languages like C has macros for this purpose. Python comes
with a number of tools that enables Metaprogramming. type
class, Metaclasses
and eval
are some. I am using type
to create python classes at runtime.
Create class with type
In Python classes are also objects. Instances of type
class. Creating a
class essentially means instantiating an object of type
. A simple class
definition.
class A():
pass
is a syntactic sugar to
A = type('A', (), {})
Let’s take a closer look at the arguments of type
. The first argument is class
name which is A
here. The seconds argument is a tuple of classes that newly
created class will extend. So another class B
that extends A
can be
created as follows.
type('B', (A,), {})
The third argument to type function is attribute mapping. Class attributes can be provided there. For example,
class C(A):
attr1 = 2
attr2 = 3
corresponds to
C = type('C', (A,), {"attr1": 2, "attr2": 3})
In Python, methods are also class attributes. In the following snippet,
class D():
bar = 1
def foo(bar):
print("foo", bar)
both foo
and bar
are attribute to class D
. The difference is that bar
is
an instance of int
class where foo
is an instance of function
class. So
def foo(bar):
print("foo")
class D():
pass
D.bar = 1
D.foo = foo
means the same thing to the interpreter.
Method invocation in Python
In Python object model, methods are bound to classes, not objects (functions can be bound to instances as well, let’s ignore that for now). When a method is invoked on an object, python search for the method name in attribute mapping of objects’ class and the classes it inherit from. Once the attribute is found, it is invoked with the object as the first parameter (self !). In a nutshell,
d = D()
d.foo()
is equivalent to
d = D()
D.foo(d)
Create functions at runtime
Now we know how to create a class at runtime and bind existing functions to it. That has some limitation. We need to define a constructor and a accept method for each child class.
Or can we create functions at runtime ?. Yes. closures for the rescue. Closure is function that remembers its enclosed environment.
def create_adder(addend):
def adder(n):
return n + addend
return adder
adder10 = create_adder(10)
adder20 = create_adder(20)
assert adder10(5) == 15
assert adder20(5) == 25
The function returned from create_adder
“remembers” its enclosed environment
and hence the value of addend. This allows us to create functions at runtime and
control its behavior.
Lets build it
We are fully armed to create classes and functions at runtime. Lets build one.
Expr = type("Expr", (), {})
Binary = type("Binary", (Expr,), {})
def build_init(fields):
def init(self, *args):
for field, val in zip(fields, args):
setattr(self, field, val)
Binary.__init__ = build_init(["left", "operator", "right"])
Note that we need to bind the return value of type
to a name explicitly. Where
as class definitions does that automatically. This brings some interesting issue.
Classes created inside a block (function/loop) won’t available in the outer
scope which makes it pretty much useless. We can inject the binding to globals
manually to overcome this limitation.
def build_init(fields):
def init(self, *args):
for field, val in zip(fields, args):
setattr(self, field, val)
def build_class(name, base, fields):
base_class = globals()[base]
klass = type(name, (base_class,), {})
klass.__init__ = build_init(fields)
globals()[name] = klass
build_class("Expr", None, [])
build_class("Binary", "Expr", ["left", "operator", "right"])
Though binding injections works fine, I prefer using a proxy object to access
auto generated classes. The proxy class maintains a mapping of names to classes
and leverages __getattr__
method to expose them as its attributes.
class Proxy():
def __init__(self):
self._map = {}
def __getattr__(self, name):
try:
return self._map[name]
except KeyError:
raise AttributeError
def add_mapping(self, name, klass):
self._map[name] = klass
proxy = Proxy()
With the proxy class, build_class
function can be modified to
def build_class(name, base, fields):
base_class = getattr(proxy, base)
klass = type(name, (base_class,), {})
klass.__init__ = build_init(fields)
proxy.add_mapping(name, class)
That’s it!. Now you can import proxy
and access the mapped classes as its
attributes proxy.Expr
and proxy.Binary
. I tried to keep the snippets above
simple and concise. You can find the actual implementation
here.