Of Chickens and Eggs

Tuesday, February 21, 2012

The Design of Hammer: Part IV

The war between dynamic and static type systems is almost as old as the idea of programming languages itself. Hell, I get into one of these debates at least once a month. The result is often a stalemate, in which the combatants agree that each philosophy is useful in its niche. When you want to exercise your creativity without being hindered by petty rules, dynamic typing is the way to go. On the other hand, when you want your program to run fast and crash less, static typing is a must.

In the previous post I stated that, when faced with a trade-off between safety and convenience, Hammer will always err on the side of safety. The trend continues here.

First, I want to clarify some terms. When I say that a type system is strong, I mean that it gives each value one type and one type only. If a language, at any point, causes a value to be cast, coerced, or converted into a different type without the programmer's explicit request, then that language has a weak type system. PHP, JavaScript, Ruby, and early flavors of C fall into this category. The general consensus is that weak typing is almost always a bad idea, because it leads to subtle, hard-to-find bugs.

When I say that a type system is static, I mean that it performs type checking prior to executing code. A dynamic type system checks types when the program is already running, so that it has more information with which to determine whether to indicate an error. Unfortunately, this also means that values in the language need to be tagged with their types in some way and that processing cycles are stolen from the actual program for validation duty. Some cleverness in language implementations can minimize these costs, but it is impossible to entirely remove the overhead of dynamic typing.

Static type systems have their own problems. Type checking without run-time information will always reject some programs that are actually valid. The number of false negatives can be reduced by making the type system more expressive, but a "perfect" type system is impossible to create, being equivalent to a general solution of the halting problem. More expressive type systems are also often more complex. Despite these disadvantages, a sound and expressive static type system is a must for Hammer.

Robin Milner once said: "Well-typed programs don't go wrong." Indeed, a sound type system can completely eliminate a whole host of common and nasty bugs. Languages like ML and Haskell never have to deal with any run time type mismatch errors. With extra cleverness, it is even possible to guarantee things like:

You will never attempt to access the head of an empty list.
You will try to multiply two matrices A and B if and only if A : R^{p * m}, B : R^{m * q} and the result C : R^{p * q}.
Every red-black tree you construct will satisfy the color invariants.
Many, many more.

Note that none of the guarantees of the type system incur run-time overhead. This is very important for a language like Hammer, which does not have garbage collection and thus would not benefit from a "boxed" representation of data. Hammer, like C, deals with raw integers, floats, and pointers. Because types have no "baggage", we get optimal performance and simple FFI marshaling.

Most important, the type system constrains side effects. Types become reliable documentation for exactly what a function will do. One can be sure that foo : Int -> Int is harmless, whereas launch_missiles : IO () warrants some inspection.

Wednesday, February 8, 2012

The Design of Hammer: Part III

When I took a course on type systems and programming languages, our professor drilled into our heads the actual meaning of the word "variable". I have decided to stick with this definition in the design of Hammer, for several reasons.

As mentioned in a previous post, the nature of systems programming demands that the applications it produces must adhere as closely as humanly possible to their specifications. Any bugs that exist can and will manifest sooner or later (and probably sooner than you think). A web server, for example, must be able to stay online continuously for days, weeks, or even months without crashing. A computer game, on the other hand, can crash routinely without much consequence (as long as the player remembers to save early and save often).

Thus, when faced with a trade-off between safety and convenience, Hammer will always err on the side of safety. Convenience is useful when you're doing exploratory programming or trying to implement a proof of concept. But, in the end, a program will be written only once but run countless times. In the long run, having to write a few extra lines of code is but a small price to pay for a better result.

Unrestricted mutability is one of those conveniences that Hammer will sacrifice. First, the downsides:

Any "global variable" will be harder to represent.
Efficient loops will take some ingenuity to fully implement.
Lazy evaluation will require syntactic sugar (to be elaborated in a later post).

Note that removing mutability didn't cause anything to become impossible, only slightly more wordy to implement. Bit-twiddling is still on the menu. The important thing to note is that Hammer, in exchange, gains the following very desirable properties:

As explained in the previous post, treating all variables as immutable enables a simple implementation of closures without automatic memory management.
The type of a function fully encodes everything that that function will do. The programmer can rely on types as documentation.
The type system becomes far simpler because it is easier to encode mutability in an initially immutable system than the other way around. C and its descendants have to struggle with complex rules regarding the propagation of consts and finals; Hammer, like ML and Haskell, handles the situation naturally via simple value semantics.
The extra guarantees that come from immutability facilitate optimization. When there is no such thing as aliasing, and the equivalent of GCC's __pure__ attribute is applied to almost every function, the compiler gets a blank check to do inlining and subexpression elimination.
Immutability makes implementing concurrency easier. Arguably, 99% of the pitfalls described in this paper do not apply. This point will be discussed in yet another post.

One might argue that removing mutability causes the programming language to move away from being a faithful representation of the underlying machine. To that, I have two responses:

Programming languages are primarily for humans to use. If you want a language that reflects the machine, go code in raw assembly. The only thing that matters is that the result is fast.
Modern machines are far more complicated than the simple five-stage pipeline you learned about in your computer architecture course. I think that immutability fits the out-of-order and highly concurrent nature of a recent multicore processor even better than something like the old C abstract machine.

And, of course, the above is pure opinion. I am open to discussion.

Monday, February 6, 2012

The Design of Hammer, Part II

The main appeal of first-class functions is that they are ridiculously simple (yes, really) and yet also ridiculously powerful. For proof, one only needs to take a look at the design of the original Lambda Calculus, the very first formalization of the notion of universal computation. Due to their appeal, first-class functions have pretty much become a must-have feature of almost all "new" programming languages. Even the notoriously conservative Java is considering "lambdas" for its next iteration.

Though the idea of first-class functions is pretty simple, their implementation is not. The vast majority of languages that provide them also require some sort of automatic memory management. Those that insist on manual memory management, like C++ or ATS, require the programmer to specify where to allocate the set of captured variables (henceforth called a "closure") and when to deallocate them.

The main difficulty comes from the interaction of lexical scoping and unrestricted mutability. If the free variables of the closure are simply captured as copies, then when one of them is modified the effect will not be observable from its original scope. On the other hand, if they are captured as pointers, then it is unsafe to do anything with them since the originals may have gone out of scope in the time between the closure's construction and its invocation. The mainstream solution is to allocate all variables on the heap and deallocate them if and only if the runtime can prove that they are no longer needed.

Unfortunately, this solution is unacceptable for a systems programming language. The details of automatic memory management are mandated by the language's implementation, which the programmer has no control over. If you're lucky, you'll have one of those cutting-edge pause-less garbage collectors. If not, good luck with writing real-time applications. If you are working on a embedded platform, your machine might not even have enough memory to accommodate the overhead of garbage collection.

Hammer avoids the above problems by assuming that variables are immutable. This allows closures to indiscriminately capture by value, eliminating the need for garbage collection. Imagine a naive implementation:

struct foo {
size_t size; // = sizeof(struct foo);

return_t (*function)(struct foo *, parameter_t);
t0 var0;
t1 var1;
...
};

The programmer can handle a lambda expression like any other literal, so he has complete control over where the closure is stored. To invoke:

...
return_t val = (*foo.function)(&foo, parameter);
...

Simple, isn't it? Immutability by default has other benefits, which will be explained in the next post.

Tuesday, January 24, 2012

The Design of Hammer: Part I

A project needs concrete goals, or else it will wander aimlessly like a headless chicken. In the case of a programming language, this requirement is even more important, because all programming languages eventually drift towards the general purpose, regardless of their original niche.

Hammer's niche is "systems programming", a term that is admittedly nebulous. For the purposes of this post, I shall define it as the act of writing software under the following set of requirements:

Resources are critically limited. That is, if the software exceeds its budget, the project will be deemed a failure. Efficiency isn't simply "nice to have"; it's essential.
The platform on which the application runs provides few or no abstractions. "Convenience" must be ported to the platform or be created from scratch.
The behavior of the software must not violate expectations. Clients regard the existence of bugs as an indication of failure rather than a mere inconvenience.

The canonical example of systems programming is, of course, the engineering of an operating system. These projects, due to their nature as the foundation upon which almost all other pieces of software are built, must fulfill the requirements given above, or else perish into obscurity. On the other hand, software like web browsers, word processors or games do not fall into this category.

To enable and facilitate systems programming, Hammer has adopted the following characteristics:

Higher order functions.
Immutability by default.
A strong, static type system.
Manual memory management.
Manual thread management.

Later posts will expand on the reasoning behind each of these design decisions.

Monday, February 21, 2011

Wait a minute, I have a blog?

I completely forgot I have this blog. It seems that I created it almost three years ago. Hot damn.

So what does one do on blogs these days? Just tell the world what one is up to? Well, I had to drop a class, so now I can focus better on:

The startup.
CUDA path tracer in Haskell.
Well-typed systems programming language.

More about them later.

Thursday, May 8, 2008

What's with the title?

This blog isn't about how to fry up a mean omelette, or how to raise poultry. Of course, omelettes and poultry are delicious, so I would have no qualms about changing the subject if given the incentive. But no. For now, this blog is about what bothers me and what must bother many other people too, even if they don't realize it:

The world runs in vicious cycles.

Hence the title. Everybody knows of the "Chicken v. Egg" problem. Which one came first? You need a chicken to lay an egg, but an egg must first hatch the chicken. Never mind that evolutionary biology has already resolved the dilemma. Primitive worms have been laying eggs long before the first Gallus gallus emerged from its shell. Now this would suggest a "Worm v. Egg" problem, but that can be resolved as well. The first genetic mutation that resulted in the production of an egg must have occurred in a species that did not previously produce eggs. Thus, because that individual is considered the first egg-laying worm, and it did not hatch from an egg, worms must have come before eggs. The reproductive details of worms and birds, however, are unimportant in the philosophical consideration of the problem.

And that is what this blog is about. Problems. Lots of them. An endless stream of them. Neatly packed into walls of text and delivered in pixel form to whomever bothers to care.