Aml Internals: 2015

neděle 24. května 2015

Informally about the recent changes

First of all, there came the change in name. This programming language used to be called Coral, but the knowledge that there is another language that shares a part of that name (CORAL 66) kept bugging me. Afterall, I’m not Apple Inc., I can’t afford to name a language the exact same name as they did with Swift (yes, there are now two languages with the exact same name). And short names of gemstones are already occupied by Ruby. So I kept thinking over the time while working on now-former-Coral, and finally came up with an idea: Gear. A tiny wheel inside a clockwork, that gives the machine life. I thought that was pretty cool, and one letter shorter than Coral. It took just a few hours to rename the whole language, in this early phase. Then the author of a language of the same name contacted me and I abandoned the name Gear in favour of Amlantis, shorter Aml, which has some properties that the name Gear did not have.

But, the early phase is slowly getting over, and the language matures very fast.

A few examples of the changes that were made in the past few days:

Comments

Comments are pretty important, no matter what your clean code teacher tells you. They express ideas that are not expressed from interfaces. Comments are like embedded metadata of your source code. If you have more such metadata than actual source code, you might be doing it wrong though. Anyway, there were recently three changes regarding syntax of comments. First, /* and */ multi-line comments were replaced with OCaml-y (* and *), and then single-line comments went from // to ;; to no single-line comments at all! Why? Because I realized I needed ;; for something else, and the inspiration from Lisps, where ; introduces all comments, just was not enough. Also, I needed // for an operator. Now comments are distinct, there is just one kind of comments (plus their documentation version (*! and *). Moreover, it’s now pretty easy to turn a single-line comment into a multi-line comment, because, well, the single-line comment is already in a multi-line comment syntax. Yay!

Workflows

This is a change that is in fact still in progress. I realized there were some parts of it in the former version that just did not quite do what they were supposed to do, so I went to the source idea in F# and discussed its specification on what its workflows do (called computation expressions though). And now I’m redoing the whole thing for Aml. Unfortunately for me, F# has some syntax goodies that Aml does not have, and also Aml has some syntaxes that F# obviously does not have, and I have to decide where the workflows will be applied and how. The very difference between workflows in Aml and computation expressions in F# is that Aml has to be able to do the translations in runtime via runtime macros.

Vendors & Modules

I always wanted to make this part right, but could not quite get it right in Coral. Now, with Aml, I thought really hard about what the real use cases could really be, and finally decided for a change in Aml’s syntax regarding modules and vendors. See, a vendor in Aml’s context is somebody, or some organization, that ships its modules to other users, either in compiled binary forms, source forms, or maybe even both. And a module could be a lot of things – it could be just a library, a utility, a complex GUI application, or even combinations of those. Originally, I thought it would be necessary to distinguish a module, vendor and a class in any path. So I came up with the Module~[Vendor].Class syntax, where the vendor just added some extra property to the module name and then the language could just list the referenced modules in some place using their full name including vendors, and then use the simple name everywhere else. But that syntax is pretty cumbersome, lengthy, not really easy to write fast. So I thought… well, on Packagist or GitHub, "modules" are referred to using something like Vendor/Module, why not use that? A slash character was already an allowed token within an identifier, so that would only lead to deletion of the extra syntax and the rest would be the same. And then I realized, that while a slash character is indeed allowed in identifiers, it is allowed in ALL identifiers, not just module names, but classes, functions, methods, damn, even variables, wouldn’t that be a problem in name resolution? And then I finally realized that not any more problem than it already had been. Module names including their vendors will already be imported from the module definition, other places must obviously import the name themselves, the language can’t do that for them.

Syntactic Forms

Again, another syntax that was borrowed from Lisps, and did not quite fit into Aml. But it survived, just with a different set of delimiters: <@ quasi-quote @> and <@@ quote @@>. I kept a simple syntax to ''quote expressions that are easily quoted for now. The new syntax fits into other similar syntaxes that need to draw attention – like goto <<labels>>, which is from Ada, just customized.

Multi-purpose Yield

I spent quite some time thinking about this, and came to realize it was necessary. Originally, yield was supposed to do what it does in Ruby – pass arguments to a block (basically a lambda) that was given to the invoked function. But, it had some extra usage in generator expressions and loops. Also, there is this Fiber.yield thing that is planned for Aml, and indeed it does again what it does in Ruby, maybe even more. So that makes up three different purposes for the yield keyword, if we ignore things like Thread.yield, which do not accept any value, and might have outcome somewhat similar to Fiber.yield, but not quite the same. Then, I quickly tested what yield does in Ruby, if no block was given. It is an error condition. Ha! Then it was easy to write down a list of simple runtime-checkable rules that would apply to usage of yield in the three different cases. Just search for yield expressions in the TOC of Aml’s specification. But, it did not went without trouble – workflows. What if, say, we use a function that yields for a loop, but that function would be used within a workflow expression? I checked what F# has to say about that in the specification, and it seems that for the seq<_> thing, it would happily ignore the yield hidden behind a function call, since, well, yield does not do quite the same thing as in Aml or Ruby when standalone. That’s when this tweet came that I realized how to solve that: ignore the function call only if it does not yield – and implement the yield using simply a thrown value. Local method cache-similar mechanisms could ensure that the invoked function remembers it’s execution state (only one frame is needed) and resume it the next time, like if it was encapsulated within a fiber.

čtvrtek 2. dubna 2015

Intended features

Object-oriented programming

The classic. Classes. Encapsulation. Vertical inheritance.
Nesting. Nested classes, nested definitions.
Everything is an object. Classes included.
Prototypes. Singleton classes. Not-so-singleton classes that can become prototypes.
Parameterized types. Infix types. Existential types. Covariance, contravariance, invariance.
Invariants. Contracts.
Traits/mixins, protocols, abstract classes. Horizontal inheritance. Multiple inheritance.
Accessibility/visibility scopes. Private, protected, public, object-private, module-protected. The usual and the less usual.
Open classes. Sealed classes.
Constructors. Constructors, where the name finally really does not matter. There is no name. Primary constructors. Auxiliary constructors. Designated constructors. Convenience constructors.
Immutability. Mutability. Frozen objects. Immutable instance variables. Class instance variables.
Single dispatch. Multiple dispatch. Multi-methods. Dynamic value dispatch.

Functional programming

Everything is a value. Yes, functions are values too.
Lambda expressions. Closures. Pattern matching. Method values.
Referential transparency. Tail-call optimizations.
Call-with-current-continuation. Delimited continuations. Full-scale continuations. Saguro stacks.
Compile-time and runtime meta-programming. Quasi-quotation. Fundep materialization.
Probably a lot of buzzwords connected with this.

Type system

Static typing. Dynamic type for multiple dispatch etc.
Types are values, too.
Dependent types. Works best with immutable and frozen objects.
Union types.
Compound types. Constrained types.
Existential types.
Function types. Partial function types. Curried function types.

Runtime

Dynamic runtime with optimizations available via static typing and more.
Hot-swap code upgrades, per module unit.
Language-agnostic.
Interpreting or just-in-time compiling AST instead of low-level bytecode.
Interface for native function implementations.

(to be continued…)

sobota 28. února 2015

Of the Origin of Aml

I started working on an idea of my custom programming language a couple of years ago. First, I was driven by the lack of certain features in multiple languages, basically just none “had it all”. PHP didn’t have got the proper typing, Ruby had got kind of none as it is so dynamic, and other languages were just not dynamic enough, like Java. It was always like you had to choose between dynamic and static, and maybe, if you were a good boy, you got something little from the other world, like the new dynamic variable type in C#, or the limited possibilities of reflection in Java.

And also I thought like “hell yeah, that is a challenge”, when it comes to creating a new language from scratch. Indeed I considered creating just a transpiler to some existing environment. But that wouldn’t be good enough for me, not flexible enough.

So I started doing a PHP interpreter for the new language, just to quickly sketch some ideas and then have a base to rewrite in another language. That didn’t work. I realised I needed some guide which to follow while designing each part of the language.

So I started creating a specification of the language first. That introduced another set of issues: which format to use for the documentation, so that it would be sustainable? All those Word/Pages formats were just too high level. Versioning these files meant working with binary files. But before I realised this, I had quite a large documentation of a very early form of the upcoming language.

That changed a lot when I first met Scala and its documentation. ’Twas using \LaTeX, and that is just the right thing to use when I wanted the documentation to be properly versioned in git. Also from reading it, I was baffled by how many important or smart features my language design was missing. So over the few following months, the language design morphed into something that only distantly resembled the original design.

The language itself is now very hybrid in many aspects.

When it comes to typing, it is primarily statically typed, but allows to opt-in for dynamic typing. How does that work? On variable level. When a variable is bound to a particular type, only values that conform to that type may be assigned to it. When a variable is not bound to any particular type, any value is permitted. Or, if your function returns a dynamically typed result, you might want to either cast it to a particular type, or early evaluate it and then have its runtime type available. Constructors of objects usually know how many instance variables to make room for in advance. Also, not really related to typing discipline, but type arguments are reified in the design, unlike in Java or Scala, because even classes are regular objects (with some extra baggage).

When it comes to being a static or a dynamic language, it is again hybrid, sort of, but by definition dynamic. And moreover, taking that to the next level, Erlang-style. The usual functionality of the language is pretty static. But, there are easily accessed functions to turn this all around and manipulate the runtime in a lot of ways: adding classes in runtime, adding new instance variables, adding new methods to selected objects (like Ruby’s singleton classes) or the whole classes (open class principle), and moreover, upgrading the whole program (or its selected compilation units) in runtime. There are indeed limitations on what can be done, e.g. constraints set by the code must always be met, otherwise errors are raised, or a new version of a compilation unit must never remove a class that still has some member values existing, e.g.

Then there are some smaller parts, like nullability. This feature was highly influenced by Swift by Apple. The nil value in Aml is an object, indeed, but its class has the special ability that it conforms to any other class. But it can’t be assigned to just any variable, to prevent unexpected NullPointerException-like errors – there has to be an explicit statement that a nil is a legitimate value for that variable, or maybe a pragma that sets that up. Also, there is an implicit conversion from nil to the None Option type.

Or macros. Influenced by Scala this time, since C-like macros have a lot of drawbacks. Macros in Aml work with the AST representation of source code, not the source code itself. Also, it allows to create type providers. And one more interesting thing – they also work in runtime. How? By saving the AST in bytecode, not any other compiled representation. The AST is enough for the interpreter to know what to do and to do that efficiently. This allows for optimisations that are not available during compilation. E.g., when referentially-transparent functions are used properly together with the static typing discipline, whole portions of AST may be replaced by a single value upon the functions’ first evaluation.

And indeed tagged unions, a feature that is missing from PHP, Ruby, Java… and a lot more features, which will not fit into this blog post.