Principles of Programming Languages Design and Implementation
Builing a Hulk Interpreter and Compiler
Preface
This book is primarily about making compilers, but it is also so much more. A compiler is one of the most exciting (and complex) projects you could attempt, and of the most interesting pieces of software you can examine. Building a compiler requires a combination of deep theoretical foundations, robust software engineering practices, and clever algorithm design and optimization. In a way, a compiler is the quintessential Computer Science application. This is why, in the process of building a compiler from scratch, you can learn a whole lot about many interrelated areas in Computer Science.
But why do we need compilers at all? You see, there is a large distance between the level of reasoning that occurs in the brain and the level of reasoning that occurs on a computer—at least, modern, traditional electronic computers like the one where you’re reading this. Compilers are our best tools so far to bridge this gap.
About this Book
This book is conceived as an immersive, hands-on journey into the architecture and engineering of a modern compiler. Rather than focusing solely on abstract automata theory or formal grammar definitions, we delve into the practical challenges of building a complete, functioning compiler for a language with rich features. The central project of this work is the HULK (Havana University Language for Kompilers) programming language, a domain-specific yet general-purpose language designed at the University of Havana to teach the intricacies of language design and implementation.
HULK is a multi-paradigm language that blends the expressive power of functional programming with the structural organization of object-oriented systems and the control flow of imperative languages. By implementing HULK, we explore a wide array of compiler construction topics, including advanced parsing strategies, sophisticated type inference and static analysis, the lowering of high-level constructs into intermediate representations, and the design of a specialized runtime environment. This project-centric approach ensures that every theoretical concept is immediately reinforced by a concrete implementation, providing a holistic view of how the various components of a compiler interact to transform human-readable code into efficient machine-executable instructions.
Our goal is to guide you through the process of building not just a toy compiler, but a robust system that includes a high-level frontend, a typed intermediate representation, and a custom virtual machine named Banner. This multi-layered architecture allows us to discuss cross-cutting concerns such as memory management, virtual method dispatch, and the trade-offs between static and dynamic typing in a realistic context.
Literate Programming
A key pedagogical pillar of this book is the use of Literate Programming, a methodology introduced by Donald Knuth that treats programs as works of literature designed to be read by humans. In this paradigm, the primary focus is on the explanation of the logic and design decisions, with the source code serving as a secondary, integrated component. By intertwining the narrative prose with the implementation, we ensure that the “why” and “how” of every function and class are clearly articulated alongside the code itself.
Technically, this is achieved through the use of Quarto (.qmd) files, which allow us to write high-quality technical documentation in Markdown while embedding executable code blocks. To bridge the gap between this literary format and the executable source files required by compilers and runtimes, we employ a tool called illiterate. This tool performs a process known as “tangling,” where it scans the .qmd files and extracts the code blocks into their respective source files in the src/ directory.
This approach offers several advantages. First, it guarantees that the code presented in the book is identical to the code that is actually compiled and tested, preventing the documentation and implementation from drifting apart. Second, it allows us to present the compiler’s implementation in a logical, narrative order that facilitates learning, rather than being constrained by the file structure required by the programming language or the build system.
How to Run the Code
To manage the complexity of this project and ensure a smooth development experience, we leverage a suite of modern tools for orchestration and dependency management. The HULK project is architecturally divided into a high-level frontend implemented in Python and a low-level virtual machine, Banner, implemented in Rust. This choice of languages allows us to benefit from Python’s flexibility and rapid prototyping capabilities for the compiler’s frontend, while relying on Rust’s performance and safety guarantees for the runtime environment.
- Orchestration: A
makefileserves as the central orchestration hub, automating the various stages of the development lifecycle. From tangling the source code to running the test suite and generating the final documentation, themakefileprovides a consistent interface for managing the project’s complex workflows. - Python Frontend: We utilize
uvfor Python dependency management.uvis a high-performance Python package installer and resolver that ensures a reproducible environment for the HULK compiler’s frontend, managing dependencies and virtual environments with speed and precision. - Rust Backend (Banner VM): The
cargoecosystem is used for the Banner virtual machine.cargohandles everything from dependency resolution to the compilation of the Rust source code, ensuring that the VM is built with optimal performance and correctness.
For those wishing to interact with the project directly, the following primary commands are available from the root of the repository:
make tangle: Invokesilliterateto extract the source code from the.qmdfiles, populating thesrc/hulk/andsrc/banner/directories.make test: Executes the comprehensive test suite, which includes unit and integration tests for both the Python frontend (viapytest) and the Rust backend (viacargo test).make preview: Launches a Quarto preview server, rendering the book’s content into a searchable, hyperlinked HTML format, complete with formatted mathematics and syntax-highlighted code.
By following this structured approach, you will be able to follow along with the development of HULK, verify the results for yourself, and perhaps even extend the language with your own features and optimizations.