neděle 17. ledna 2016

Aml/System Plans

A side note: I indefinitely postponed any development of the Aml# and Aml#/System "dialects". It does not make much sense to create these at the moment, and once Aml Core is a viable SDK, then maybe, maybe.

The current plan for Aml/System is as follows:

  1. Write a Rust program that would be the stage 0 Aml/System compiler. It does not have to do much, it does not need to compile all of Aml/System features and libraries, just the most basic things needed for a stage 1 compiler to compile. For that, decide what is this minimal subset of Aml/System. Probably the only way to do that is to simultaneously write a stage 1 compiler. (This is where we’re at now.)
  2. Write a stage 1 compiler, using a subset of Aml/System. This compiler will have to be able to compile every Aml/System feature, including any possible libraries, and including itself. 
  3. Write a stage 2 compiler, using the full feature set of Aml/System. This compiler will then be completely self-hosted, being able to compile itself (while the first build will be compiled by a stage 1 compiler). It’s questionable whether this step is necessary, or if there will have to be more stages to incrementally get to a stage N compiler, where N is the final compiler stage written in the same language that it compiles. 
It is possible that in between writing a stage 1 compiler and a stage 2 compiler, there will be a switch in technology, regarding what the intermediary output would be. Compilers up to stage 1 will very likely just transpile to C or C++ (not sure yet), depending on the ways of the clang and msvc, while stage 2 may drop the C/C++ intermediary form and go right for Aml/System IR to LLVM IR transformation, which is something that can happen while already happily building runtime for Aml/Core using a stage 1 compiler, because the runtime does not care if its compiler is a self-hosted one or nah. 

The dependency on cc, gcc or link.exe is something Aml/System is not likely to get rid of while targeting any of the currently serving Operating Systems (until plan-89 comes to life). 

Note that switching from a clang/msvc backed program building and moving to LLVM IR code generation presents more challenges than it may seem: 
  • Learn LLVM IR, while LLVM itself is in active development and may change in BC-breaking manner anytime. The solution is obviously to fork it and only merge changes when ready. 
  • Learn how to interoperate with the libc etc. from within LLVM IR. 
  • Learn code optimizations, since clang/msvc will no longer do that for us. 
  • Learn archive packaging (heard some rumours that the Windows archive tool has some troubles with non-object members of archive files). Might need to use llvm-ar for that instead. 
  • Also, true not just for this switch, but – support debugging interfaces. Aml/Core will have a very much different runtime from Aml/System, and we can design debugging interface for Aml/Core ourselves, since probably nobody would ever use an existing one for the job – also, remember that the runtime is also very much different from the regular stack-based runtimes of other high-level languages. Unlike the runtime of Aml/System, unfortunately. Maybe plan-89 will allow fancier code to be written in Aml/System, we’ll see. 
P.S.: Aml/System, as well as Aml/Core and all of its docs are written as open source under the MIT license, so you can use it however you wish to, but – it would be really appreciated if you could just join the efforts and help out with the job. 

čtvrtek 14. ledna 2016

Future plans

While Aml is still undergoing much development, this blog did not get the same amount of attention. So to make amends, here are some news. Aml basically split into a few sub-projects, specifically a family of languages:

  • Aml Core, which is the former Aml as it was designed.
  • Aml#, a dialect of Aml that may or may not come to live, but it’s intended to be an even more ML-like language. 
  • Aml S, another Aml dialect, more likely to come alive – intended to be to Aml what Clojure is to Java, simply put. Also as a powerful DSL for configs on steroids. 
  • Aml System, almost the same language as Aml Core, limited, different runtime – intended to be something like what C is to C++. 
  • Aml# System, the same, but with the more ML-like appearance. 
Work has started on Aml Core runtime, using the C++ and Rust languages (separate projects, to see which one would work the best for the cause). This now seems like a major bad decision. The new intention is to write Aml Core runtime, using Aml System language, so it can feel more self-hosted, although with two separate runtimes. 

To build Aml System though, one has to make a long journey. That is, we can’t really write Aml System in Aml System yet, since we don’t have any Aml System compiler at hand. The process will likely look something like this:
  1. Write an initial Aml System compiler in Rust or D (probably Rust though). The key requirement is that the host language has to have nice support for unit testing, so that we can safely rely on Aml System compiler’s results. And imho, Rust fulfills that better. Another option could be OCaml (heard that Rust used that for its own initial compiler). 
  2. Since we have now an Aml System compiler, we can rewrite Aml System compiler using the previously built version of itself. 
  3. Now Aml System compiler became self-hosting. 
  4. Write Aml Core runtime in Aml System. Native functions in Aml Core can be also written using Aml System, and since the general syntax of the two languages is the same, the native source code may be embedded in Aml Core source files directly. 
  5. There is no need to write Aml Core runtime in Aml Core, because the language is interpreted. 
  6. Now that the MVP of Aml is finished, we can focus efforts on Aml S, whose runtime can in fact be written using both Aml System and Aml Core, maybe intelligent combination of both. 
There are also some phases and challenges in implementation of Aml System that might be worth mentioning. The main question is: what would be the output of the initial Aml System compiler. 
  • C source code, or some other low-level language source code. 
    • Could leverage existing clang and LLVM optimizations. 
  • LLVM IR. 
    • More freedom in implementation, more work. 
Another thing is how the resulting binaries compiled by Aml System compiler would work in the host system. It could link to some libc, libm, like Rust does, or it could use its own version of that and interface the OS directly. So many questions. The most likely path is that the initial version would simply compile to C code, and link to a libc. Then, since the output will still be a bunch of native code object files (maybe in archives, dynamic libraries etc.), the compiler can later switch on to LLVM IR and skip the C part, use only the system’s SDK linker, that would be nice (and is kind of what Rust does, so I believe that would be the final phase with Aml/System to get to). 

neděle 24. května 2015

Informally about the recent changes

First of all, there came the change in name. This programming language used to be called Coral, but the knowledge that there is another language that shares a part of that name (CORAL 66) kept bugging me. Afterall, I’m not Apple Inc., I can’t afford to name a language the exact same name as they did with Swift (yes, there are now two languages with the exact same name). And short names of gemstones are already occupied by Ruby. So I kept thinking over the time while working on now-former-Coral, and finally came up with an idea: Gear. A tiny wheel inside a clockwork, that gives the machine life. I thought that was pretty cool, and one letter shorter than Coral. It took just a few hours to rename the whole language, in this early phase. Then the author of a language of the same name contacted me and I abandoned the name Gear in favour of Amlantis, shorter Aml, which has some properties that the name Gear did not have.

But, the early phase is slowly getting over, and the language matures very fast.

A few examples of the changes that were made in the past few days:

Comments

Comments are pretty important, no matter what your clean code teacher tells you. They express ideas that are not expressed from interfaces. Comments are like embedded metadata of your source code. If you have more such metadata than actual source code, you might be doing it wrong though. Anyway, there were recently three changes regarding syntax of comments. First, /* and */ multi-line comments were replaced with OCaml-y (* and *), and then single-line comments went from // to ;; to no single-line comments at all! Why? Because I realized I needed ;; for something else, and the inspiration from Lisps, where ; introduces all comments, just was not enough. Also, I needed // for an operator. Now comments are distinct, there is just one kind of comments (plus their documentation version (*! and *). Moreover, it’s now pretty easy to turn a single-line comment into a multi-line comment, because, well, the single-line comment is already in a multi-line comment syntax. Yay! 

Workflows

This is a change that is in fact still in progress. I realized there were some parts of it in the former version that just did not quite do what they were supposed to do, so I went to the source idea in F# and discussed its specification on what its workflows do (called computation expressions though). And now I’m redoing the whole thing for Aml. Unfortunately for me, F# has some syntax goodies that Aml does not have, and also Aml has some syntaxes that F# obviously does not have, and I have to decide where the workflows will be applied and how. The very difference between workflows in Aml and computation expressions in F# is that Aml has to be able to do the translations in runtime via runtime macros. 

Vendors & Modules

I always wanted to make this part right, but could not quite get it right in Coral. Now, with Aml, I thought really hard about what the real use cases could really be, and finally decided for a change in Aml’s syntax regarding modules and vendors. See, a vendor in Aml’s context is somebody, or some organization, that ships its modules to other users, either in compiled binary forms, source forms, or maybe even both. And a module could be a lot of things – it could be just a library, a utility, a complex GUI application, or even combinations of those. Originally, I thought it would be necessary to distinguish a module, vendor and a class in any path. So I came up with the Module~[Vendor].Class syntax, where the vendor just added some extra property to the module name and then the language could just list the referenced modules in some place using their full name including vendors, and then use the simple name everywhere else. But that syntax is pretty cumbersome, lengthy, not really easy to write fast. So I thought… well, on Packagist or GitHub, "modules" are referred to using something like Vendor/Module, why not use that? A slash character was already an allowed token within an identifier, so that would only lead to deletion of the extra syntax and the rest would be the same. And then I realized, that while a slash character is indeed allowed in identifiers, it is allowed in ALL identifiers, not just module names, but classes, functions, methods, damn, even variables, wouldn’t that be a problem in name resolution? And then I finally realized that not any more problem than it already had been. Module names including their vendors will already be imported from the module definition, other places must obviously import the name themselves, the language can’t do that for them. 

Syntactic Forms

Again, another syntax that was borrowed from Lisps, and did not quite fit into Aml. But it survived, just with a different set of delimiters: <@ quasi-quote @> and <@@ quote @@>. I kept a simple syntax to ''quote expressions that are easily quoted for now. The new syntax fits into other similar syntaxes that need to draw attention – like goto <<labels>>, which is from Ada, just customized. 

Multi-purpose Yield

I spent quite some time thinking about this, and came to realize it was necessary. Originally, yield was supposed to do what it does in Ruby – pass arguments to a block (basically a lambda) that was given to the invoked function. But, it had some extra usage in generator expressions and loops. Also, there is this Fiber.yield thing that is planned for Aml, and indeed it does again what it does in Ruby, maybe even more. So that makes up three different purposes for the yield keyword, if we ignore things like Thread.yield, which do not accept any value, and might have outcome somewhat similar to Fiber.yield, but not quite the same. Then, I quickly tested what yield does in Ruby, if no block was given. It is an error condition. Ha! Then it was easy to write down a list of simple runtime-checkable rules that would apply to usage of yield in the three different cases. Just search for yield expressions in the TOC of Aml’s specification. But, it did not went without trouble – workflows. What if, say, we use a function that yields for a loop, but that function would be used within a workflow expression? I checked what F# has to say about that in the specification, and it seems that for the seq<_> thing, it would happily ignore the yield hidden behind a function call, since, well, yield does not do quite the same thing as in Aml or Ruby when standalone. That’s when this tweet came that I realized how to solve that: ignore the function call only if it does not yield – and implement the yield using simply a thrown value. Local method cache-similar mechanisms could ensure that the invoked function remembers it’s execution state (only one frame is needed) and resume it the next time, like if it was encapsulated within a fiber. 

čtvrtek 2. dubna 2015

Intended features

Object-oriented programming


  • The classic. Classes. Encapsulation. Vertical inheritance.
  • Nesting. Nested classes, nested definitions. 
  • Everything is an object. Classes included. 
  • Prototypes. Singleton classes. Not-so-singleton classes that can become prototypes. 
  • Parameterized types. Infix types. Existential types. Covariance, contravariance, invariance. 
  • Invariants. Contracts. 
  • Traits/mixins, protocols, abstract classes. Horizontal inheritance. Multiple inheritance. 
  • Accessibility/visibility scopes. Private, protected, public, object-private, module-protected. The usual and the less usual. 
  • Open classes. Sealed classes. 
  • Constructors. Constructors, where the name finally really does not matter. There is no name. Primary constructors. Auxiliary constructors. Designated constructors. Convenience constructors. 
  • Immutability. Mutability. Frozen objects. Immutable instance variables. Class instance variables. 
  • Single dispatch. Multiple dispatch. Multi-methods. Dynamic value dispatch. 

Functional programming

  • Everything is a value. Yes, functions are values too. 
  • Lambda expressions. Closures. Pattern matching. Method values. 
  • Referential transparency. Tail-call optimizations. 
  • Call-with-current-continuation. Delimited continuations. Full-scale continuations. Saguro stacks. 
  • Compile-time and runtime meta-programming. Quasi-quotation. Fundep materialization. 
  • Probably a lot of buzzwords connected with this. 

Type system

  • Static typing. Dynamic type for multiple dispatch etc. 
  • Types are values, too. 
  • Dependent types. Works best with immutable and frozen objects. 
  • Union types. 
  • Compound types. Constrained types. 
  • Existential types. 
  • Function types. Partial function types. Curried function types. 

Runtime

  • Dynamic runtime with optimizations available via static typing and more. 
  • Hot-swap code upgrades, per module unit. 
  • Language-agnostic. 
  • Interpreting or just-in-time compiling AST instead of low-level bytecode. 
  • Interface for native function implementations. 
(to be continued…)

sobota 28. února 2015

Of the Origin of Aml

I started working on an idea of my custom programming language a couple of years ago. First, I was driven by the lack of certain features in multiple languages, basically just none “had it all”. PHP didn’t have got the proper typing, Ruby had got kind of none as it is so dynamic, and other languages were just not dynamic enough, like Java. It was always like you had to choose between dynamic and static, and maybe, if you were a good boy, you got something little from the other world, like the new dynamic variable type in C#, or the limited possibilities of reflection in Java.

And also I thought like “hell yeah, that is a challenge”, when it comes to creating a new language from scratch. Indeed I considered creating just a transpiler to some existing environment. But that wouldn’t be good enough for me, not flexible enough.

So I started doing a PHP interpreter for the new language, just to quickly sketch some ideas and then have a base to rewrite in another language. That didn’t work. I realised I needed some guide which to follow while designing each part of the language.

So I started creating a specification of the language first. That introduced another set of issues: which format to use for the documentation, so that it would be sustainable? All those Word/Pages formats were just too high level. Versioning these files meant working with binary files. But before I realised this, I had quite a large documentation of a very early form of the upcoming language.

That changed a lot when I first met Scala and its documentation. ’Twas using \LaTeX, and that is just the right thing to use when I wanted the documentation to be properly versioned in git. Also from reading it, I was baffled by how many important or smart features my language design was missing. So over the few following months, the language design morphed into something that only distantly resembled the original design.

The language itself is now very hybrid in many aspects.

When it comes to typing, it is primarily statically typed, but allows to opt-in for dynamic typing. How does that work? On variable level. When a variable is bound to a particular type, only values that conform to that type may be assigned to it. When a variable is not bound to any particular type, any value is permitted. Or, if your function returns a dynamically typed result, you might want to either cast it to a particular type, or early evaluate it and then have its runtime type available. Constructors of objects usually know how many instance variables to make room for in advance. Also, not really related to typing discipline, but type arguments are reified in the design, unlike in Java or Scala, because even classes are regular objects (with some extra baggage).

When it comes to being a static or a dynamic language, it is again hybrid, sort of, but by definition dynamic. And moreover, taking that to the next level, Erlang-style. The usual functionality of the language is pretty static. But, there are easily accessed functions to turn this all around and manipulate the runtime in a lot of ways: adding classes in runtime, adding new instance variables, adding new methods to selected objects (like Ruby’s singleton classes) or the whole classes (open class principle), and moreover, upgrading the whole program (or its selected compilation units) in runtime. There are indeed limitations on what can be done, e.g. constraints set by the code must always be met, otherwise errors are raised, or a new version of a compilation unit must never remove a class that still has some member values existing, e.g.

Then there are some smaller parts, like nullability. This feature was highly influenced by Swift by Apple. The nil value in Aml is an object, indeed, but its class has the special ability that it conforms to any other class. But it can’t be assigned to just any variable, to prevent unexpected NullPointerException-like errors – there has to be an explicit statement that a nil is a legitimate value for that variable, or maybe a pragma that sets that up. Also, there is an implicit conversion from nil to the None Option type.

Or macros. Influenced by Scala this time, since C-like macros have a lot of drawbacks. Macros in Aml work with the AST representation of source code, not the source code itself. Also, it allows to create type providers. And one more interesting thing – they also work in runtime. How? By saving the AST in bytecode, not any other compiled representation. The AST is enough for the interpreter to know what to do and to do that efficiently. This allows for optimisations that are not available during compilation. E.g., when referentially-transparent functions are used properly together with the static typing discipline, whole portions of AST may be replaced by a single value upon the functions’ first evaluation.

And indeed tagged unions, a feature that is missing from PHP, Ruby, Java… and a lot more features, which will not fit into this blog post.