sobota 28. února 2015

Of the Origin of Aml

I started working on an idea of my custom programming language a couple of years ago. First, I was driven by the lack of certain features in multiple languages, basically just none “had it all”. PHP didn’t have got the proper typing, Ruby had got kind of none as it is so dynamic, and other languages were just not dynamic enough, like Java. It was always like you had to choose between dynamic and static, and maybe, if you were a good boy, you got something little from the other world, like the new dynamic variable type in C#, or the limited possibilities of reflection in Java.

And also I thought like “hell yeah, that is a challenge”, when it comes to creating a new language from scratch. Indeed I considered creating just a transpiler to some existing environment. But that wouldn’t be good enough for me, not flexible enough.

So I started doing a PHP interpreter for the new language, just to quickly sketch some ideas and then have a base to rewrite in another language. That didn’t work. I realised I needed some guide which to follow while designing each part of the language.

So I started creating a specification of the language first. That introduced another set of issues: which format to use for the documentation, so that it would be sustainable? All those Word/Pages formats were just too high level. Versioning these files meant working with binary files. But before I realised this, I had quite a large documentation of a very early form of the upcoming language.

That changed a lot when I first met Scala and its documentation. ’Twas using \LaTeX, and that is just the right thing to use when I wanted the documentation to be properly versioned in git. Also from reading it, I was baffled by how many important or smart features my language design was missing. So over the few following months, the language design morphed into something that only distantly resembled the original design.

The language itself is now very hybrid in many aspects.

When it comes to typing, it is primarily statically typed, but allows to opt-in for dynamic typing. How does that work? On variable level. When a variable is bound to a particular type, only values that conform to that type may be assigned to it. When a variable is not bound to any particular type, any value is permitted. Or, if your function returns a dynamically typed result, you might want to either cast it to a particular type, or early evaluate it and then have its runtime type available. Constructors of objects usually know how many instance variables to make room for in advance. Also, not really related to typing discipline, but type arguments are reified in the design, unlike in Java or Scala, because even classes are regular objects (with some extra baggage).

When it comes to being a static or a dynamic language, it is again hybrid, sort of, but by definition dynamic. And moreover, taking that to the next level, Erlang-style. The usual functionality of the language is pretty static. But, there are easily accessed functions to turn this all around and manipulate the runtime in a lot of ways: adding classes in runtime, adding new instance variables, adding new methods to selected objects (like Ruby’s singleton classes) or the whole classes (open class principle), and moreover, upgrading the whole program (or its selected compilation units) in runtime. There are indeed limitations on what can be done, e.g. constraints set by the code must always be met, otherwise errors are raised, or a new version of a compilation unit must never remove a class that still has some member values existing, e.g.

Then there are some smaller parts, like nullability. This feature was highly influenced by Swift by Apple. The nil value in Aml is an object, indeed, but its class has the special ability that it conforms to any other class. But it can’t be assigned to just any variable, to prevent unexpected NullPointerException-like errors – there has to be an explicit statement that a nil is a legitimate value for that variable, or maybe a pragma that sets that up. Also, there is an implicit conversion from nil to the None Option type.

Or macros. Influenced by Scala this time, since C-like macros have a lot of drawbacks. Macros in Aml work with the AST representation of source code, not the source code itself. Also, it allows to create type providers. And one more interesting thing – they also work in runtime. How? By saving the AST in bytecode, not any other compiled representation. The AST is enough for the interpreter to know what to do and to do that efficiently. This allows for optimisations that are not available during compilation. E.g., when referentially-transparent functions are used properly together with the static typing discipline, whole portions of AST may be replaced by a single value upon the functions’ first evaluation.

And indeed tagged unions, a feature that is missing from PHP, Ruby, Java… and a lot more features, which will not fit into this blog post.