Appendix A — The HULK Programming Language

In this final part of the book, we present a straightforward and comprehensive definition of the HULK programming language, along with a set of possible extensions. This part formalizes the language we’ve been working on the entire book, and should serve as a reference for anyone wanting to build their own HULK compiler.

A.1 HULK in a nutshell

HULK (Havana University Language for Kompilers) is a didactic, type-safe, object-oriented and incremental programming language, designed for the course Introduction to Compilers in the Computer Science major at University of Havana.

A simple “Hello World” in HULK looks like this:

print("Hello World");

In a bird’s eye view HULK is an object-oriented programming language, with simple inheritance, polymorphism, and encapsulation at the class level. Also, in HULK it is possible to define global functions outside the scope of all classes. It is also possible to define a single global expression that constitutes the entry point to the program.

Most of the syntactic constructions in HULK are expressions, including conditional instructions and cycles. HULK is a statically typed language with optional type inference, which means that some (or all) parts of a program can be annotated with types, and the compiler will verify the consistency of all operations.

A.1.1 A didactic language

The HULK language has been designed as a mechanism for learning and evaluating a college course about compilers. For this reason, certain language design decisions respond more to didactic questions than to theoretical or pragmatic questions. An illustrative example is the inclusion of a single basic numerical type. In practice, programming languages have several numeric types (int, float, double, decimal) to cover the wide range of trade-off between efficiency and expressivity. However, from the didactic point of view, it is enough complexity to have to deal with a numerical type, and the inclusion of others does not bring anything new from our point of view.

Another important decision is the static typing with type inference, which will be explained later in detail. The motivation behind this feature is to allow students to first implement an evaluator for the language, and then worry about type verification. Likewise, the decision to have global expressions, global functions, and classes, responds to the need to introduce the various elements of language little by little. By having global expressions, it is possible to implement an expression interpreter without the need to solve context-sensitive problems. Later, students can implement functions and finally the object-oriented features. In this way students can learn on the fly as they add characteristics to the language, always having a valid subset of the language implemented.

A.1.2 An incremental language

As its name indicates, HULK is a huge language. Actually, the HULK language really is not really a single programming language, but a set of programming languages. That is, HULK is designed as a set of layers, each with a new language feature that add increasingly more complex functionalities on top of the previous layers. It starts with a basic syntax for expressions, then global functions, and then a unified type system with simple inheritance. Afterwards, HULK grows to contain arrays, delegates, type inference, iterators, among other characteristics. All these language features have been designed to be compatible with each other. Furthermore, each language feature clearly describes on which other language features it depends.

This design has been conceived to allow the use of HULK at a wide range of learning levels. As a language of expressions and functions, it is useful for introductory courses on parsing and basic compilation techniques. Object orientation introduces a whole universe of semantic complexities; however, the HULK type system is simple enough to illustrate the most common problems in semantic type verification. Vectors introduce problems related to memory management, while anonymous functions and iterators are fundamentally problems of transpilation and code generation. The inference of types and the verification of null-safety is an exercise in logical inference, which can be used in advanced courses. The idea is that each course defines its objectives of interest, and can use an appropriate subset of HULK to illustrate and evaluate them.

A.2 Expressions

HULK is ultimately an expression-based language. Most of the syntactic constructions in HULK are expressions, including the body of all functions, loops, and any other block of code.

The body of a program in HULK always ends with a single global expression (and, if necessary, a final semicolon) that serves as the entrypoint of the program. This means that, of course, a program in HULK can consist of just one global expression.

For example, the following is a valid program in HULK:

42;

Obviously, this program has no side effects. A slightly more complicated program, probably the first one that does something, is this:

print(42);

In this program, print refers to a builtin function that prints the result of any expression in the output stream. We will talk about functions in a later section.

The rest of this section explains the basic expressions in HULK.

A.2.1 Arithmetic expressions

HULK defines three types of literal values: numbers, strings, and booleans. We will leave strings and booleans for later.

Numbers are 32-bit floating-point and support all basic arithmetic operations with the usual semantics: + (addition), - (subtraction), * (multiplication), `(floating-point division),^` (power), and parenthesized sub-expressions.

The following is a valid HULK program that computes and prints the result of a rather useless arithmetic expression:

print((((1 + 2) ^ 3) * 4) / 5);

All usual syntactic and precedence rules apply.

A.2.2 Strings

String literals in HULK are defined within enclosed double-quotes ("), such as in:

print("Hello World");

A double-quote can be included literally by escaping it:

print("The message is "Hello World"");

Other escaped characters are for line endings, and for tabs.

Strings can be concatenated with other strings (or the string representation of numbers) using the @ operator:

print("The meaning of life is " @ 42);

A.2.3 Builtin math functions and constants

Besides print, HULK also provides some common mathematical operations encapsulated as builtin functions with their usual semantics. The list of builtin math functions is the following:

  • sqrt(<value>) computes the square root if a value.
  • sin(<angle>) computes the sine of an angle in radians.
  • cos(<angle>) computes the cosine of an angle in radians.
  • exp(<value>) computes the value of e raised to a value.
  • log(<base>, <value>) computes the logarithm of a value in a given base.
  • rand() returns a random uniform number between 0 and 1 (both inclusive).

Besides these functions, HULK also ships with two global constants: PI and E which represent the floating-point value of these mathematical constants.

As expected, functions can be nested in HULK (provided the use of types is consistent, but so far all we care about is functions from numbers to numbers, so we can forget about types until later on). Hence, the following is a valid HULK program.

print(sin(2 * PI) ^ 2 + cos(3 * PI / log(4, 64)));

More formally, function invocation is also an expression in HULK, so everywhere you expect an expression you can also put a call to builtin function, and you can freely mix arithmetic expressions and mathematical functions, as you would expect in any programming language.

A.2.4 Expression blocks

Anywhere an expression is allowed (or almost), you can also use an expression block, which is nothing but a series of expressions between curly braces ({ and }), and separated by ;.

The most trivial usage of expression blocks is to allow multiple print statements as the body of a program. For example, the following is a valid HULK program:

{
    print(42);
    print(sin(PI/2));
    print("Hello World");
}

When you use an expression block instead of a single expression, it is often not necessary to end with a semicolon (;), but it is not erroneous to do so either.

A.3 Functions

HULK also lets you define your own functions (of course!). A program in HULK can have an arbitrary number of functions defined before the final global expression (or expression block).

A function’s body is always an expression (or expression block), hence all functions have a return value (and type), that is, the return value (and type) of its body.

A.3.1 Inline functions

The easiest way to define a function is the inline form. Here’s an example:

function tan(x) => sin(x) / cos(x);

An inline function is defined by an identifier followed by arguments between parenthesis, then the => symbol, and then a simple expression (not an expression block) as body, ending in ;.

In HULK, all functions must be defined before the final global expression. All these functions live in a single global namespace, hence it is not allowed to repeat function names. Similarly, there are no overloads in HULK (at least in “basic” HULK).

Finally, the body of any function can use other functions, regardless of whether they are defined before or after the corresponding function. Thus, the following is a valid HULK program:

function cot(x) => 1 / tan(x);
function tan(x) => sin(x) / cos(x);

print(tan(PI) ** 2 + cot(PI) ** 2);

And of course, inline functions (and any other type of function) can call themselves recursively.

A.3.2 Full-form functions

Since inline functions only allow for a single expression as body (as complex as that may be), HULK also allows full-form functions, in which the body is an expression block.

Here’s an example of a rather useless function that prints 4 times:

function operate(x, y) {
    print(x + y);
    print(x - y);
    print(x * y);
    print(x / y);
}

Note that the following form is discouraged for stylistic reasons:

function id(<args>) => {
    // ...
}

That is, you should either use the inline form with => and a simple expression, or the full form with {} and an expression block.

A.4 Variables

Variables in HULK are lexically-scoped, which means that their scope is explicitely defined by the syntax. You use the let expression to introduce one or more variables and evaluate an expression in a new scope where those variables are defined.

The simplest form is introducing a single variable and using a single expression as body.

let msg = "Hello World" in print(msg);

Here msg is a new symbol that is defined only within the expression that goes after in.

A.4.1 Multiple variables

The let expression admits defining multiple variables at once like this:

let number = 42, text = "The meaning of life is" in
    print(text @ number);

This is semantically equivalent to the following long form:

let number = 42 in
    let text = "The meaning of life is" in
        print(text @ number);

As you can notice, let associates to the right, so the previous is also equivalent to:

let number = 42 in (
    let text = "The meaning of life is" in (
            print(text @ number)
        )
    );

A.4.2 Scoping rules

Since the binding is performed left-to-right (or equivalently starting from the outer let), and every variable is effectively bound in a new scope, you can safely use one variable when defining another:

let a = 6, b = a * 7 in print(b);

Which is equivalent to (and thus valid):

let a = 6 in
    let b = a * 7 in
        print(b);

A.4.3 Expression block body

You can also use an expression block as the body of a let expression:

let a = 5, b = 10, c = 20 in {
    print(a+b);
    print(b*c);
    print(c/a);
}

As we said before, semicolons (;) are seldom necessary after an expression block, but they are never wrong.

A.4.4 The let return value

As with almost everything in HULK, let is an expression, so it has a return value, which is obviously the return value of its body. This means the following is a valid HULK program:

let a = (let b = 6 in b * 7) in print(a);

Or more directly:

print(let b = 6 in b * 7);

This can be of course nested ad infinitum.

A.4.5 Redefining symbols

In HULK every new scope hides the symbols from the parent scope, which means you can redefine a variable name in an inner let expression:

let a = 20 in {
    let a = 42 in print(a);
    print(a);
}

The previous code prints 42 then 20, since the inner let redefines the value of a inside its scope, but the value outside is still the one defined by the outer let.

And because of the scoping rules, the following is also valid:

let a = 7, a = 7 * 6 in print(a);

Which is equivalent to:

let a = 7 in
    let a = 7 * 6 in
        print(a);

A.4.6 Destructive assignment

Most of the time in HULK you won’t need to overwrite a variable, but there are cases where you do. In those cases, you can use the destructive assignment operator :=, like this:

let a = 0 in {
    print(a);
    a := 1;
    print(a);
}

The previous program prints 0 and then 1, since the value of a is overwritten before the second print. This is the only way in which a variable can be written to outside of a let.

As you would expect, the := operator defines an expression too, which returns the value just assigned, so you can do the following:

let a = 0 in
    let b = a := 1 in {
        print(a);
        print(b);
    };

This is useful if you want to evaluate a complex expression to both test it (e.g, to se if its greater than zero) and store it for later use.

A.4.7 Rules for naming identifiers

Variables (and identifiers in general) in HULK can be named with any sequence of alphanumeric characters, plus underscore _, but must always begin with a letter (not a digit or _), hence the following are all valid identifiers:

  • x
  • x0
  • x_0
  • lowercase
  • TitleCase
  • snake_case
  • camelCase

The following are invalid HULK identifiers:

  • _x
  • x+y
  • some method
  • 8ball

And many others of course!

Since starting with an underscore _ is invalid in user-produced HULK code, you will notice that when we talk about transpilation in HULK, variables and identifiers in transpiled code always start with _.

A.5 Conditionals

The if expression allows evaluating different expressions based on a condition.

let a = 42 in if (a % 2 == 0) print("Even") else print("odd");

Since if is itself an expression, returning the value of the branch that evaluated true, the previous program can be rewritten as follows:

let a = 42 in print(if (a % 2 == 0) "even" else "odd");

Conditions are just expressions of boolean type. The following are the valid boolean expressions:

  • Boolean literals: true and false.
  • Arithmetic comparison operators: <, >, <=, >=, ==, !=, with their usual semantics.
  • Boolean operators: & (and), | (or), and ! (not) with their usual semantics.

A.5.1 Expression blocks in conditionals

The body of the if or the else part of a conditional (or both) can be an expression block as well:

let a = 42 in
    if (a % 2 == 0) {
        print(a);
        print("Even");
    }
    else print("Odd");

A.5.2 Multiple branches

The if expression supports multiple branches with the elif construction, which introduces another conditioned branch:

let a = 42, let mod = a % 3 in
    print(
        if (mod == 0) "Magic"
        elif (mod % 3 == 1) "Woke"
        else "Dumb"
    );

A.6 Loops

HULK defines two kinds of loops, the while expression and the for expression. Both loop constructions are expressions, returing the value of the

A.6.1 The while loop

A while loop evaluates a condition and its body while the condition is true. The body can be a simple expression or an expression block.

let a = 10 in while (a >= 0) {
    print(a);
    a := a - 1;
}

Since the return value of the while loop is the return value of its expression body, it can often be used directly as the body of a function.

function gcd(a, b) => while (a > 0)
    let m = a % b in {
        b := a;
        a := m;
    };

A.6.2 The for loop

A for loop iterates over an iterable of elements of a certain type. We will talk about iterables later on, but for now it suffices to say that if some expression evaluates to a collection, then the for loop can be used to iterate it.

For example, the builtin range(<start>, <end>) function evaluates to an iterable of numbers between <start> (inclusive) and <end> (non-inclusive).

for (x in range(0, 10)) print(x);

The for loop is semantically and operationally equivalent to the following:

let iterable = range(0, 10) in
    while (iterable.next())
        let x = iterable.current() in
            print(x);

In fact, what the reference implementation of the HULK compiler does in for loops is to transpile them into their while equivalent. This also effectively means that, just like the while loop, the for loop returns the last value of its body expression.

A.7 Types

HULK is ultimately an object-oriented language with simple inheritance and nominal typing. It also has features of structural typing via protocols, which support language features such as iterables, which we will explain later.

This section explains the basics of HULK’s nominal typing system.

A type in HULK is basically a collection of attributes and methods, encapsulated under a type name. Attributes are always private, which means they can’t be read or writen to from any code outside the type in which they are defined (not even inheritors), while methods are always public and virtual.

A.7.1 Declaring types

A new type is declared using the type keyword followed by a name, and a body composed of attribute definitions and method definitions. All attributes must be given an initialization expression. Methods, like functions, can have a single expression or an expression block as body;

type Point {
    x = 0;
    y = 0;

    getX() => self.x;
    getY() => self.y;

    setX(x) => self.x := x;
    setY(y) => self.y := y;
}

The body of every method is evaluated in a namespace that contains global symbols plus an especial symbol named self that references the current instance. The self symbol is not a keyword, which means it can be hidden by a let expression, or by a method argument.

However, when referring to the current instance, self is not a valid assignment target, so the following code should fail with a semantic error:

type A {
    // ...
    f() {
        self := new A(); // <-- Semantic error, `self` is not a valid assignment target
    }
}

A.7.2 Instantiating types

To instantiate a type you use the keyword new followed by the type name:

let pt = new Point() in
    print("x: " @ pt.getX() @ "; y: " @ pt.getY());

As you can see, type members are accessed by dot notation (instance.member).

You can pass arguments to a type, that you can use in the initialization expressions. This achieves an effect similar to having a single constructor.

type Point(x, y) {
    x = x;
    y = y;

    // ...
}

Then, at instantiation time, you can pass specific values:

let pt = new Point(3,4) in
    print("x: " @ pt.getX() @ "; y: " @ pt.getY());

Each attribute initialization expression is evaluated in a namespace that contains the global symbols and the type arguments, but no the self symbol. This means you cannot use other attributes of the same instance in an attribute initialization expression. This also means that you cannot assume any specifc order of initialization of attributes.

A.7.3 Inheritance

Types in HULK can inherit from other types. The base of the type hierarchy is a type named Object which has no public members, which is the type you implicitely inherit from by default. To inherit from a specific type, you use the inherits keyword followed by the type name:

type PolarPoint inherits Point {
    rho() => sqrt(self.getX() ^ 2 + self.getY() ^ 2);
    // ...
}

By default, a type inherits its parent type arguments, which means that to construct a PolarPoint you have to pass the x and y that Point is expecting:

let pt = new PolarPoint(3,4) in
    print("rho: " @ pt.rho());

If you want to define a different set of type arguments, then you have to provide initialization expressions for the parent type at the declaration:

type PolarPoint(phi, rho) inherits Point(rho * sin(phi), rho * cos(phi)) {
    // ...
}

During construction, the expressions for type arguments of the parent are evaluated in a namespace that contains global symbols plus the type arguments of the inheritor. Like before, you cannot assume a specific order of evaluation.

In HULK, the three builtin types (Number, String, and Boolean) implicitely inherit from Object, but it is a semantic error to inherit from these types.

A.7.4 Polymorphism

All type methods in HULK are virtual by definition, and can be redefined by an inheritor provided the exact same signature is used:

type Person(firstname, lastname) {
    firstname = firstname;
    lastname = lastname;

    name() => self.firstname @@ self.lastname;
}

NOTE: @@ is equivalent to @ " " @. It is a shorthand to insert a whitespace between two concatenated strings. There is no @@@ or beyond, we’re not savages.

type Knight inherits Person {
    name() => "Sir" @@ base();
}

let p = new Knight("Phil", "Collins") in
    print(p.name()); // prints 'Sir Phil Collins'

The base symbol in every method refers to the implementation of the parent (or the closest ancestor that has an implementation).

A.8 Type checking

HULK is a statically-typed language with optional type annotations. So far you haven’t seen any because HULK has a powerful type inference system which we will talk about later on. However, all symbols in HULK have a static type, and all programs in HULK are statically checked during compilation.

Tye annotations can be added anywhere a symbol is defined, that is:

  • in variable declarations with let expressions;
  • in function or method arguments and return type;
  • in type attributes; and,
  • in type arguments.

Let’s see an example of each case.

A.8.1 Typing variables

Variables can be explicitely type-annotated in let expressions with the following syntax:

let x: Number = 42 in print(x);

The type checker will verify that the type inferred for the initialization expression is compatible with (formally, conforms to) the annotated type.

A.8.2 Typing functions and methods

All or a subset of a function’s or method’s arguments, and its return value, can be type-annotated with a similar syntax:

function tan(x: Number): Number => sin(x) / cos(x);

On the declaration side, the type checker will verify that the body of the method uses the types in a way that is consistent with their declaration. The exact meaning of this consistency is defined in the section about type semantics. The type checker will also verify that the return type of the body conforms to the annotated return type.

On the invocation side, the type checker will verify that the values passed as parameters conform to the annotated types.

Inside methods of a type T, the implicitly defined self symbol is always assumed as if annotated with type T.

A.8.3 Typing attributes and type arguments

In type definitions, attributes and type arguments can be type-annotated as follows:

type Point(x: Number, y: Number) {
    x: Number = x;
    y: Number = y;

    // ...
}

The type checker will verify that type arguments are used consistently inside attribute initialization expressions, and that the inferred type for each attribute initialization expression conforms to the attribute annotation.

A.8.4 Type conforming

The basic type relation in HULK is called conforming (<=). A type T1 is said to conform to to another type T2 (writen as T1 <= T2) if a variable of type T2 can hold a value of type T1 such that every possible operation that is semantically valid with T2 is guaranteed to be semantically valid with T1.

In general, this means that the type checker will verify that the inferred type for any expression conforms to the corresponding type declared for that expression (e.g., the type of a variable, or the return type of a function).

The following rules provide an initial definition for the conforming relationship. The formal definition is given in the section about type semantics.

  • Every type conforms to Object.
  • Every type conforms to itself.
  • If T1 inherits T2 then T1 conforms to T2.
  • If T1 conforms to T2 and T2 conforms to T3 then T1 conforms to T3.
  • The only types that conform to Number, String, and Boolean, are respectively those same types.

Types in HULK form a single hierarchy rooted at Object. In this hierarchy the conforming relationship is equivalent to the descendant relationship. Thus, if T1 conforms to T2 that means that T1 is a descendant of T2 (or trivially the same type). Thus, we can talk of the lowest common ancestor of a set of types T1, T2, …, Tn, which is the most specific type T such that all Ti conform to T. When two types are in different branches of the type hierarchy, they are effectively incomparable.

NOTE: this conforming relationship is extended when we add protocols.

A.8.5 Testing for dynamic types

The is operator allows to test an object to check whether its dynamic type conforms to a specific static type.

type Bird {
    // ...
}

type Plane {
    // ...
}

type Superman {
    // ...
}

let x = new Superman() in
    print(
        if (x is Bird) "It's bird!"
        elif (x is Plane) "It's a plane!"
        else "No, it's Superman!"
    );

In general, before the is operator you can put any expression, not just a variable.

A.8.6 Downcasting

You can use the as operator to downcast an expression to a given static type. The result is a runtime error if the expression is not a suitable dynamic type, which means you should always test if you’re unsure:

type A {
    // ...
}

type B inherits A {
    // ...
}

type C inherits A {
    // ...
}

let x : A = if (rand() < 0.5) new B() else new C() in
    if (x is B)
        let y : B = x as B in {
            // you can use y with static type B
        }
    else {
        // x cannot be downcasted to B
    }

A.9 Type inference

Since every program in HULK is statically type-checked, and type annotations are optional in most cases, this means that HULK infers types for most of the symbols in a program.

Because the problem of type inference is computationally complex, and ultimately unsolvable in the general case, the HULK reference definition doesn’t give precise semantics about how the type inferer must work. Rather, we will give only a set of minimal constraints that the type inferer must assert if a type is inferred at all for a given symbol, or otherwise it must fail to infer types.

A.9.1 Type inference vs type checking

The type inferer works before the type checker, and assigns type annotations to all symbols that are not explicitly annotated, and to all the expressions. Afterwards, the type checker verifies that all semantic rules are valid.

Thus, even if a program is fully annotated, the type inferer still needs to work, since it needs to infer the type of all expressions. When some symbols are not explicitly annotated, the type inferer must also assign types for them.

Hence, there are two different moments when a semantic error can be reported. First, if the type inferer cannot infer the type of some symbol, a semantic error will be thrown to indicate the programmer that some symbol must be explicitly typed. Second, if the type inferer finished without errors, the type checker will verify that all types are consistent, and will report a semantic error if there is some incompatibilty.

A.9.2 Type inference of expressions

The first task of the type inferer is to infer the runtime type of any expression that appears in a HULK program. This process is performed bottom-up, starting from atomic sub-expressions (e.g., literals) and working up the AST. The exact rules for type inference of expressions is given in the section a`bout type semantics, but an intuitive introduction can be given at this point.

Literals are the easiest to type-infer, because their type comes directly from the parser. Arithmetic expressions are also easy, because their type is always Number. Likewise, string and boolean operators are straightforward.

The type of complex expressions that have an expression body is determined by the type of the body. This is the case of let, while, and for. The type of an expression block is the type of the last expresion of the block. The type of a function or method invocation is the type of its body. The type of expressions that have more than one branch (if) is the lowest common ancestor of the types of each branch, or ultimately Object.

A.9.3 Type inference of symbols

Once all expressions have been type-inferred, the type inferer will attempt to assing a type to each symbol declaration that is not explicitly annotated. Instead of providing an exact algorithm, we will define a set of constraints that the type inferer must satisfy whenever it succeeds in assigning a type.

Specific implementations of HULK can choose different methods to attempt the type inference of symbols. According to the order in which symbols are processed, and the sophistication of each method, some implementations may succed where others fail. However, if two type inference algorithms are correct, they most agree on all types for which both succeed in the inference.

These are the constraints a type inference algorithm must satisfy to be correct, or otherwise it must report a failed inference.

  • In a let expression, whenever a variable is not type-annotated, the type inferer must asign a type for the variable that is equivalent to the type infered for its initialization expression.
  • Similarly, in an attribute declaration that is not type-annotated, the type inferer must assign a type that is equivalent to the type inferred for its initialization expression.
  • In a function or method, whenever an argument is not type-annotated, the type inferer must assign the lowest (most specific) type that would be consistent with the use of that argument in the method or function body. If more than one type in different branches of the type hierarchy would be consistent, the type inferer must fail.
  • Similarly, in a type argument, the type inferer must assign the lowest type that is consistent with the use of that argument in all attribute initialization expressions where it is referenced.

If a type inferer satisfies those constraints, we will say it is sound. This means that, for example, the simplest sound strategy for type inference is to infer types for all expressions and fail for all symbols. We will call this the basic inference strategy.

A.9.4 Examples of ad-hoc type inference

These are some programs where a sufficiently sophisticated type inference strategy should work.

In the following program the type of the variable x should be inferred to Number because the type of 42 is trivially Number:

let x = 42 in print(x);

In the following function, the type of the argument n should be inferred as Number because it is the only possible type where arithmetic operators (i.e., +) are defined, as there is no operator overloading in HULK:

function fib(n) => if (n == 0 | n == 1) 1 else fib(n-1) + fib(n-2);

If you implement operator overloading, then the inferred type should be the appropriate protocol.

For the same reason, in the following function, the type of the argument x should be inferred as Number. Likewise, the type of the variable f should be inferred as Number because the initialization expression is a literal Number.

function fact(x) => let f = 1 in for (i in range(1, x+1)) f := f * i;

A.9.5 A general strategy for type inference

If you implement protocols (explained later), then a general strategy for type inference consists in synthesizing appropriate protocols for all non-annotated symbols, based on their use. Since protocols support structural type checking, this should allow the type checker to detect any inconsistencies in a later pass.

For example, consider the following code:

type A {
    f() => "Hello";
    g() => "World";
}

function h(x) => x.f() @@ x.g();

let x = new A() in print(h(x));

In the previous code, the type inferrer can determine that, whatever type x has, it should support two methods, f and g. Furthermore, given the use of the @@ operator, the return value of both methods should support the @@ operation (in principle, only String does, but if you implement operator overloading, there is a specific protocol for that operator).

Thus, the type inferrer can synthesize the following protocol:

protocol _P1 {
    f(): String;
    g(): String;
}

And it should annotate the code (actually, the AST) in a way that is equivalent to the following:

// type A and Protocol _P1

function h(x: _P1): String => x.f() @@ x.g();

let x: A = new A() in print(h(x));

From the point of view of the type checker, the previous code is semantically correct, since A conforms to the protocol _P1.

Note that the process of synthesizing protocols could require several iterations, since not all types in a synthesized protocol may be known at a first glance. For example:

function f(x) => x.a();

function g(x) => x.b();

let x = new T() in print(g(f(x)));

Regardless of how T looks like, the type inferrer here must first define a protocol for f that has a method a(), and analogous for g. But crucially, at this point, it is not clear from either f or g what the return type of these methods is.

Thus, at this point, the best a type inferrer can do is claim f receives something like:

protocol _P1 {
    a(): Any;
}

// ...

function f(x: _P1): Any => x.a();

And similarly for g:

protocol _P2 {
    b(): Any;
}

// ...

function g(x: _P2): Any => x.b();

Then, a series of passes on the AST start to refine these protocols. For example, the call to g in the last line of the code above will force the type inferrer to refine _P1 to:

protocol _P1 {
    a(): _P2;
}

Which in turns, makes f now return _P2. Likewise, the call to print makes the type inferrer refine _P2 to:

protocol _P2 {
    b(): Object;
}

Once now new information can be inferred, the type inferrer will stop and the program will be type checked. All types left as Any will be reported as errors.

NOTE: To code a robust type inferrer is much harder than what the previous explanation might seem. There are plenty of corner cases and heuristics. This section is just an initial suggestion to guide the implementation.

A.10 Protocols

Protocols are special types which support a limited form of structural typing in HULK. The difference between structural and nominal typing in HULK, is that the latter is explicit while the former is implicitely defined. That is, a type doesn’t need to explicitely declare that it conforms to a protocol.

Protocols have a syntax similar to that of types, except that they only have method declarations, and they have no body, only signatures. Hence, protocols define the methods that a type must have in order to support some operation.

Protocols don’t exist at runtime, they are compile-time only concept that helps writing more flexible programs. After type checking, all information about protocols can be safely removed.

A.10.1 Defining protocols

A protocol is defined with the keyword protocol followed by a collection of method declarations:

protocol Hashable {
    hash(): Number;
}

A protocol can have any number of method declarations. For obvious reasons, all method declarations in protocol definitions must be fully typed, as it is impossible to infer any types since they have no body.

A protocol can extend anoter protocol by adding new methods, but never overriding (since there is no actual body) or removing any method (althought you can override the types of some method arguments or return types provided with some restrictions explained below).

protocol Equatable extends Hashable {
    equals(other: Object): Boolean;
}

A.10.2 Implementing protocols

A type implements a protocol implicitely, simply by having methods with the right signature. There is no need to explicitely declare which types implement which protocols.

Thus, you can annotated a variable or argument with a protocol type, and the type checker will correctly verify the consistency of both the method body and the invocation.

type Person {
    // ...

    hash() : Number {
        // ...
    }
}

let x : Hashable = new Person() in print(x.hash());

Anywhere you can annotate a symbol with a type (variables, attributes, function, method and type arguments, and return values), you can also use a protocol. For the purpose of type inference, protocols are treated as types.

A.10.3 Variance in protocol implementation

In order to implementing a protocol, a type doesn’t necessarily have to match the exact signature of the protocol. Instead, method and type arguments are considered contravariant, and return values covariant. This means that arguments can be of the same type or higher, and the return values of the same type or lower than as defined in the protocol.

Similarly, when you extend a protocol, you can override some of the methods as long as you respect the variance constraints.

A.10.4 Conforming with protocols

More formally, protocols extend the notion of type conforming by adding the following rules:

  • A type T conforms to a protocol P if T has all the method defined in P with the appropriate types (respecting the variance constraints explained before).
  • If a protocol P1 extends a protocol P2, then trivially P1 <= P2.
  • A protocol P1 also conforms to another protocol P2 if any type that conforms to P1 would also conform to P2, even if there is no explicit extension declared.

A.11 Iterables

An iterable in HULK is any object that follows the iterable protocol, which is defined as follows:

protocol Iterable {
    next() : Boolean;
    current() : Object;
}

An example of iterable is the builtin range function, which returns an instance of the builtin Range type, defined as follows:

type Range(min:Number, max:Number) {
    min = min;
    max = max;
    current = min - 1;

    next(): Boolean => (self.current := self.current + 1) < max;
    current(): Number => self.current;
}

Notice that since protocols are covariant in the return types of the methods, the Range type correctly implements the Iterable protocol.

A.11.1 Using iterables with the for loop

As explained in the loops section, the for loop works with the Iterable protocol, which means you can apply for on any instance of a type that implements the protocol.

In compile-time, for is transpiled to a code that is equivalent, but explicitely uses the Iterable protocol members.

For example, the code:

for (x in range(0,10)) {
    // code that uses `x`
}

Is transpiled to:

let iterable = range(0, 10) in
    while (iterable.next())
        let x = iterable.current() in {
            // code that uses `x`
        }

This transpilation guarantees that even though the Iterable protocol defines the current method with return type Object, when you use a for loop you will get the exact covariant type inferred in x.

As a matter of fact, due to the transpilation process, the Iterable protocol itself is not even necessary, since nowhere is a symbol annotated as Iterable. However, the protocol is explicitely defined as a builtin type so that you can explicitly use it if you need to annotate a method to receive a black-box iterable.

Keep in mind, thought, that when you annotate something explicitely as Iterable, you are effectively forcing the type inferrer to assign Object as the type of the iteration variable (x in this example). This is one of the reasons it is often better to let HULK infer types than annotating them yourself.

A.11.2 Typing iterables

Since in the Iterable protocol we can only define (at this point) the return value of current() as Object, it is cumbersome to type arguments of a function or method as Iterable, because doing so will force you to downcast the elements to a desired type.

For this reason, HULK allows a special syntax for typing iterables of a specific type T using the format T*:

function sum(numbers: Number*): Number =>
    let total = 0 in
        for (x in numbers)
            total := total + x;

What happens under the hood is that when you use of T* anywhere in a HULK program, the compiler will insert an implicit protocol definition that looks like this:

protocol Iterable_T extends Iterable {
    current(): T;
}

Since protocols can be extended by overriding some methods with the correct variance constraints, the previous code will compile correctly.

A.11.3 Implementing collections

The iterable protocols defined so far encapsulates the concept of making a single iteration over the sequence of elements. In contrast, most collection types you will define allow for multiple iterations, even simultaneously, over the same sequence of elements.

To accomodate for this kind of behaviour, we can define an enumerable protocol that simply provides one method to create an iterable for one specific iteration everytime that is needed:

protocol Enumerable {
    iter(): Iterable;
}

With this protocol defined, the for loop is extended such that, when used with an enumerable instead of directly an iterable, it will transpile to a slightly different code:

let iterable = enumerable.iter() in
    while (iterable.next())
        let x = iterable.current() in {
            // ..
        }

A.12 Vectors

The builtin vector type provides a simple but powerful abstraction for creating collections of objects of the same type. In terms of functionality, a vector is close to plain arrays as defined in most programming languages. Vectors implement the iterable protocol so they can be iterated with a for syntax.

Vectors in HULK can be defined with two different syntactic forms: explicit and implicit.

A.12.1 Explicit syntax

An explicit vector of Number, for example, can be defined as follows:

let numbers = [1,2,3,4,5,6,7,8,9] in
    for (x in numbers)
        print(x);

Because vectors implement the iterable protocol, you can explicitely find a next and current methods in case you ever need them. Besides that, vectors also have a size(): Number method that returns the number of items in the vector.

Vectors also support an indexing syntax using square brackets [], as in the following example:

let numbers = [1,2,3,4,5,6,7,8,9] in print(numbers[7]);

A.12.2 Implicit syntax

An implicit vector can be created using what we call a generator pattern, which is always an expression.

Here’s one example:

let squares = [x^2 | x in range(1,10)] in print(x);
// prints 2, 4, 6, 8, 10, ...

In general, the syntax has the form [<expr> | <symbol> in <iterable>], where <expr> is run in a new scope where symbol is iteratively bound to each element in the vector.

A.12.3 Typing vectors

Since vectors are iterables, you can safely pass a vector as argument to method that expects an iterable:

function sum(numbers: Number*): Number =>
    let total = 0 in
        for (x in numbers)
            total := total + x;

let numbers = [1,2,3,4,5] in
    print(sum(numbers));

However, inside sum you cannot use the indexing operator [] or the size method, because the argument is typed as an iterable, and not explicitly as a vector. To fix this, HULK provides another special syntax for vectors, using the T[] notation:

function mean(numbers: Number[]): Number =>
    let total = 0 in {
        for (x in numbers)
            total := total + x;

        // here `numbers` is known to be vector
        total / numbers.size();
    };

let numbers = [1,2,3,4,5] in
    print(mean(numbers));

Like with iterables, what happens under the hood is that the compiler implicitely defines a type with the following structure:

type Vector_T {
    size() {
        // impementation of size ...
    }

    iter(): Iterable_T {
        // implementation of iter
    }
}

A.13 Functors

A functor in HULK is an object that encapsulates a function, which means it supports the obj() syntax. This can be accomplished with protocols easily, via transpilation. If you have a type that implements a functor protocol, then HULK will allow you to use the functor syntax. A functor protocol is any protocol that has an invoke method with appropriate type annotations.

For example, suppose you declare the following protocol in HULK:

protocol NumberFilter {
    invoke(x: Number): Boolean;
}

Then, you can annotate a function to receive an object that implements this protocol:

function count_when(numbers: Number*, filter: NumberFilter) {
    let total = 0 in
        for (x in numbers)
            total := total + if (filter.invoke(x)) 1 else 0;
}

But, since that protocol is a functor (it contains an invoke method), you can also use it directly as if it where a method, with the following syntax:

function count_when(numbers: Number*, filter: NumberFilter) {
    let total = 0 in
        for (x in numbers)
            total := total + if (filter(x)) 1 else 0;
}

To implement a functor protocol, you simply define a type that implements the protocol, as usual, and then you can use it:

type IsOdd {
    invoke(x: Number): Boolean => x % 2 == 0;
}

let numbers = range(0, 100) in
    print(count_when(numbers, IsOdd())); // prints `50`

But this syntax is extremely cumbersome, so HULK provides lots of syntax sugar to simplify the declaration and usage of functors.

A.13.1 Implicit functor implementation

The first aid that HULK provides is by implicitely implementing wrapping functions as functor types upong usage. For example, instead of defining the IsOdd type like before, you can simply define an is_odd function like the following, and pass it directly to the count_when function:

function is_odd(x: Number) => x % 2 == 0;

let numbers = range(0, 100) in
    print(count_when(numbers, is_odd));

And then HULK will automatically create an appropriate functor type that implements the desired protocol, which means the previous code is transpiled to something like the following:

function is_odd(x: Number) => x % 2 == 0;

type _IsOddWrapper {
    invoke(x: Number): Boolean => is_odd(x);
}

let numbers = range(0, 100) in
    print(count_when(numbers, _IsOddWrapper()));

Naturally, this syntax sugar extends to variable assignment as well, which means the following is valid:

let numbers = range(0, 100), filter: NumberFilter = is_odd in
    print(count_when(numbers, filter));

A.13.2 Lambda expressions

Keeping up with the previous example, we can eliminate the explicit is_odd definition and pass a lambda expression, which is an anonymous function defined directly in the place when the functor is needed:

let numbers = range(0, 100) in
    print(count_when(numbers, (x: Number): Boolean => x % 2 == 0));

The general syntax for lambda expressions is very similar to the syntax for inline functions, except that you don’t need to name the function.

Also, if the type inferrer is good enough, you can almost always drop the explicit type annotations:

let numbers = range(0, 100) in
    print(count_when(numbers, (x) => x % 2 == 0));

And of course, lambda expressions can be stored in appropriately typed variables:

let numbers = range(0, 100), filter: NumberFilter = (x) => x % 2 = 0 in
    print(count_when(numbers, filter));

And the type inferrer is good enough, since count_when requires a NumberFilter, you can drop the explicit type annotation:

let numbers = range(0, 100), filter = (x) => x % 2 = 0 in
    print(count_when(numbers, filter));

A.13.3 Typing functors

And finally, we can also skip the protocol definition and use a special syntax for typing functors directly in the type annotaion:

function count_when(numbers: Number*, filter: (Number) -> Boolean) {
    // same code
}

The syntax (Number) -> Boolean indicates that we expect a functor with a single input of type Number and an output of type Boolean. Upon finding this definition, HULK will transpile that into something that is very similar to our explicit protocol definition:

protocol _Functor0 {
    invoke(_arg0: Number) : Boolean;
}

function count_when(numbers: Number*, filter: _Functor0) {
    // same code
}

A.14 Macros

Macros are a way to extend HULK with “functions” that are transpiled at compilation-time to standard HULK, instead of executed in runtime. But macros are considerable more powerful than functions, both sintactically and semantically. Macros in HULK are extremely powerful because they work at the sintactic level, which means they perform transformations directly over the abstract syntax tree. Besides that, their syntax allows to define sort of keyword-like language constructs.

Since macros are a complex topic, let’s start with a simple scenario.

Suppose you want to have something like the following in HULK:

repeat(10) {
    // expressions
}

You quickly see that this code is equivalent to the (arguably a lot more verbose) following syntax:

let total = n in
    while (total >= 0) {
        total := total - 1;
        // expressions
    };

You can easily encapsulate this pattern in a repeat function that takes a number and an a general expression (as a functor):

function repeat(times: Number, expr: () -> Object): Object {
    let total = n in
        while (total >= 0) {
            total := total - 1;
            expr();
        };
}

And while this may work for your case, it has a couple of downsides. First, you don’t exactly get the desired syntax, instead of:

repeat(10) {
    // expressions
}

You have to write something like the following, which is close, but still slightly more cumbersome and dirty.

repeat(10, () => {
    // expressions
});

The second, and most important one, is that the expr here encapsulates a computation that, from the point of view of the repeat function, is a black box. We will focus on why this matters later on.

A.14.1 Defining macros

Instead of a function, you can use a macro, which has a very similar syntax in HULK:

def repeat(n: Number, *expr: Object): Object =>
    let total = n in
        while (total >= 0) {
            total := total - 1;
            expr;
        };

But this change makes macros exceedingly more powerful than functions in a lot of cases, for a few reasons. First, notice the use of the *expr: Object syntax, instead of the expr: () -> Object. Here the * denotes that this expr is not a regular argument, instead it is a special argument that refers to the code inside the brackets after the macro invocation. Thus, you can use the following syntax:

repeat(10) {
    print("Hello World");
}

The { print("Hello World"); } expression block is precisely what is passed on in the special argument *expr.

However, there is much more going on under that macro invocation. Instead of calling a functor in runtime, macros are expanded in compile time and transpiled into their bodies, which means there is no real repeat function anywhere in the compiled code. Instead, the actual code that is executed is something like:

let _total = 10 in
    while (_total >= 0) {
        _total := _total - 1;
        {
            print("Hello World");
        };
    }

This is the reason why you don’t see expr(); in the macro body, but expr;. That is, the body is not executed but interpolated inside the macro. This transpilation step makes macros often faster than functions because there is no extra overhead for passing arguments, however, you must be careful when thinking about the operational semantics of a macro especially where they differ from a regular function call.

A.14.2 Variable sanitization

Upon macro expansion, the variables inside the body of a macro are replaced with a special unique name generated by the compiler. This ensures that no variable in the context of the macro invocation can be accidentally hidden or used in unpredictable ways.

Take for example the following code:

let total = 10 in repeat(total) {
    print(total);
};

If variables inside the body of the repeat macro wheren’t sanitazed, then the print statement would print 9, 8, etc, which is kind of unexpected unless you happen to know how the repeat macro is implemented, violating the principle of encapsulation. Even worse, this would happen if your variable is named total, but not if it’s named something else, which again is surprising and inconsistent. However, since the variable total inside the body of repeat will be renamed to something completely different upon macro expansion, you can be certain that the print statement will work as expected, regardless of the name you happen to choose for your variable.

A.14.3 Symbolic arguments

There are times, though, when you want the macro to reuse a symbol that comes from its external context (a variable or attribute). In these cases, you can use the especial syntax @symbol to define a symbolic argument in the macro, and then bind a specific symbol upon macro expansion.

This is best explained with an example. Let’s suppose we want to implement a swap macro that swaps the content of two variables. This cannot be done unless the macro can actually assign to the variables we want to swap. We would define the macro as:

def swap(@a: Object, @b: Object) {
    let temp: Object = a in {
        a := b;
        b := temp;
    }
}

And we invoke the macro as:

let x: Object = 5, y: Object = "Hello World" in {
    swap(@x, @y);
    print(x);
    print(y);
};

Which will be expanded to something like (except that _temp will be a generated name):

let x: Object = 5, y: Object = "Hello World" in {
    let _temp = x in {
        x := y;
        y := _temp;
    };
    print(x);
    print(y);
};

Notice how the actual names of the x and y variables are interpolated in the macro expansion. Of course, the type checker will guarantee that on invocation the x and y symbols are variables of the corresponding type.

A.14.4 Variable placeholders

Macros can also introduce a new symbol into the scope in which they are expanded, which can then be used in the body argument (or the other arguments). The syntax for this is $symbol. We call this a “variable placeholder”, because it holds the name for a variable that will be introduced upon macro expansion.

Again, this is best explained with an example. Let’s add a variable to the repeat macro to indicates the current iteration. We would define the macro as:

def repeat($iter: Number, n: Number, *expr:Object) {
    let iter: Number = 0, total:Number = n in {
        while (total >= 0) {
            total := total - 1;
            expr;
            iter := iter + 1
        };
    }
}

Now when calling the macro, you can specify a name for the $iter variable placeholder:

repeat(current, 10) {
    print(current);
};

The effect is that upon macro expansion, the variable placeholder $iter will be renamed to current and thus the body of the macro will correctly reference it. The actual expansion looks similar to the following code:

let current: Number = 0, _total:Number = n in {
    while (_total >= 0) {
        _total := _total - 1;
        {
            print(current);
        };
        current := current + 1
    };
};

The compiler ensures that the use of the new variable in the body of the macro is consistent with the type declared for the variable placeholder in the macro. However, it is entirely possible for the macro not to define the variable, or to define it conditioned on some structure of the body (we will see how that’s achieved in the pattern matching section). In any case, since macro expansion is performed at compile time, any inconsistency that may arise will be captured by the compiler.

A.14.5 Pattern matching

By far the most powerful feature of macros is structural pattern matching. This feature allows to deconstruct an argument and generate a specific code depending on the argument structure. The reason this is possible is because macros run on compile time, so when you declare an argument of type Number, for example, what you’ll get in the macro body is the actual expression tree of the argument, and not just the final evaluated object.

As everything else with macros, this feature is much better understood with examples. Let’s suppose you want to define a macro called simplify, for no better use than to illustrate how powerful macros are compared to regular functions. This is how you would do it:

def simplify(expr:Number) {
    match(expr) {
        case (x1:Number + x2:Number) => simplify(x1) + simplify(x2);
        case (x1:Number + 0) => simplify(x1);
        case (x1:Number - x2:Number) => simplify(x1) + simplify(x2);
        case (x1:Number - 0) => simplify(x1);
        case (x1:Number * x2:Number) => simplify(x1) * simplify(x2);
        case (x1:Number * 1) => simplify(x1);
        // ... you get the idea
        default => expr;
    };
}

You would use the macro as follows:

print(simplify((42+0)*1));

And the actual generated code would be:

print(42);

Notice that this transformation happens during compilation time, not execution. The actual code that gets compiled is the simplified expression.