home > posts

Retrofitting OCaml onto Me

2023-07-28

ocaml • personal


I think it is safe to say over the past three years I've almost exclusively been writing software in a programming language called OCaml. Most of my contemporaries have either suggested that a certain Cambridge professor has brainwashed me or that my career goals extend only to strange networking applications, leaving the practical world behind or working for that OCaml company. I'll let you decide if any of these are true.

All joking aside, it was a conversation with Anil that sparked the desire to write down just what I've been hacking on for the guts of three years now and maybe reflect a little on what I've learned. But first a little backstory.

A Computer Scientist that Hates Computers

I finished my undergraduate degree in Computer Science at Pembroke College in June 2020 having started in October 2017. Even now I still believe the fact that I got in was a mixture of hard work and luck (an emphasis on the latter). I had never done a single Computer Science course and my only knowledge of any programming was through Dan Shiffman's Coding Train. That's not a slight on the Coding Train, it is still my favourite programming resource and I send anyone who asks me about getting started straight to that channel.

When I started in 2017 I was abruptly reminded of just how out of place I felt when our introductory course was taught in Poly/ML. When asked by my fellow college first-years (who are now my good friends) what "OS" I preferred, I didn't know what they were talking about (even with the "Operating System" clarification). Turns out the correct answer was macOS... In second year I was introduced to OCaml via a Compiler Construction course, although the focus was on compilers rather than the slightly obscure programming language.

When everything was all said and done I managed a 2:1. I had good support within my college and occasionally some excellent supervisors (in particular thank you @dra27 and @sadiqj). My final year dissertation was on "Optimisations across Software and Hardware using RISC-V" which amounted to hacking together new RISC-V instructions (such as ocvali which shifts left and adds one...) for OCaml.

Perhaps hate is too strong, but I've never been particularly interested in computers, hardware, software or theory. I saw computer science as a means of being useful to lots of different things that I am passionate about.

OCaml at Tarides

After graduating during the pandemic, I ended up back in Belfast working remotely for Tarides. This is where my OCaml journey began in earnest. Whilst I managed to cover a diversity of work during my time at Tarides, retrospectively I think I can divide it into three large chunks.

OCaml.org

OCaml has a reputation for not being very well documented. To a certain extent I agree. My first task when joining Tarides back in 2020 was to start documenting workflows. This was broadly categorised by user type (Library Authors, Teachers, End Users, Beginners, Distribution Managers and Application Developers) and platform tool. It's particularly gratifying to link to the new OCaml.org website where the platform tools are now listed. Especially given that it looks a lot better than the first site I rustled together! "Explore OCaml" was the first project I made at Tarides and it was invaluable to my understanding of OCaml and the common workflows of your average OCaml programmer.

A few standout moments for me during this process were:

Opam and Dune

A lot of my work is somewhat branched. I'm not one to shy away from creating a new git repository for an idea, spend hours furiously hacking and then move on. More often than not those repositories or branches will get rebased onto another project perhaps years away! One such idea was the Opam and Dune documentation I began to write around this time. For example, a little guide to opam with documentation on switches and compilers.

v3.ocaml.org

Without dwelling too much on the passage of time, it's funny to think there are some OCaml developers who don't know v2.ocaml.org. For a while I helped maintain the site and tried to improve its accessibility. It was also my first real-world introduction to Makefiles.

A group of people from Tarides (myself included) and Solvuu embarked on redesigning and reimplementing the OCaml.org website to help modernise it. It started life in ReScript which was another good opportunity for learning, but later was rewritten in OCaml to unify the stack. In the long run I think this decision paid off as OCaml.org is now a shining example of community-effort and using OCaml.

ppx_deriving_yaml

We used a lot of "jekyll formatted" markdown documents for the data in v3.ocaml.org. This meant parsing metadata from yaml into OCaml. For example:

title: string
tags: string list

Might become:

type t = {
    title : string;
    tags : string list;
}

As far as I can remember, this was the first "I'm reaching for a library and OCaml doesn't have it" moment. So I implemented it and in December 2020 I made my first opam release of a package.

Continuous Integration

After reaching the first milestones for v3.ocaml.org, I moved to the Continuous Integration (CI) team in Tarides. Most OCaml developers are probably unaware they are using a service maintained by this team, such as the CI that runs on opam-repository. The task for me was to provide container-like sandboxing on macOS workers in the cluster.

As with a lot of the projects and teams I worked on, I started by documenting. In particular, wrapping my head around a docker build-like alternative called OBuilder. It was suggested to use a FUSE filesystem to handle homebrew... the finer details are recorded here. It's certainly a hack but somehow mainly works!

One habit drilled into me from my undergraduate years was to be pretty consistent with taking daily notes. Here's a note on the day the FUSE hack clicked with me.

A messy note descibing how FUSE can intercept call to /usr/local and redirect them to a user's specific home directory in order to get around homebrew being globally installed.

I really cut my systems-programming teeth whilst in the CI team. I'm very grateful for the team's patience with this project. A special thanks to @talex5 and @kit-ty-kate for giving up their time to both help and teach me.

Effects & Eio

The joke in the title of this post was perhaps a little subtle, but it is based on two of the core papers behind OCaml 5.

By now branded as a "macOS" person there was only one logical next project whilst working on effects... a macOS backend for Eio! Eio is a portable, direct-style concurrency and IO library for OCaml. I spent copious amounts of time getting my head around the io-uring backend for Linux before beginning to hack on the Grand Central Dispatch (GCD) backend for macOS... this proved tricky and cumbersome with issues arising form GCD's concurrency and parallelism model.

By the end of my time working on Eio, I had written five work-in-progress backends which are still lying around or have been merged and since improved:

  1. The initial GCD backend, this would still likely be needed should anyone want to use Eio on iOS.
  2. An alternative kqueue backend for macOS (and maybe the BSDs) which is more promising.
  3. An initial IOCP backend for Windows copying in large part the io-uring backend.
  4. A Unix.select backend for Windows which is the basis of eio_windows inspired by eio_posix.
  5. A js_of_ocaml backend for writing browser applications with Eio, with some examples. It was actually in my spare time that I revived the jsoo-effects.

It is time for another sincere thank you to @talex5 for patiently reviewing my (in need of guidance and direction) PRs.

In my spare time I ported Irmin and OCurrent to Eio. This was a nice way to take a break from the internal system-programming frustration and just see if the API was any good. I then built try-eio to let others get a feel for it with no setup required on their part.

To help debug programs, I started hacking on a monitoring tool for Eio called meio. @TheLortex then took this and made it many, many times better!

Bonus Round: Outreachy

During my time at Tarides, a group of us (primarily lead by Anil) decided to reboot the OCaml community's participation in Outreachy!

Outreachy provides internships in open source and open science. Outreachy provides internships to people subject to systemic bias and impacted by underrepresentation in the technical industry where they are living.

I think this is by far the most rewarding part of my time at Tarides. For the first round we used v3.ocaml.org as the host project and were inundated by contributors and applications. A massive thanks to my fellow mentors Sonja, Gargi, Isabella and Anil.

From there, although at times quite time-consuming, I think Outreachy in OCaml only went from strength to strength. And it turns out the OCaml community has quite a long history of supporting these initiatives. See the Outreachy page on OCaml.org for more information and see here for my participation and projects.

Mirage and Irmin

I didn't limit myself to just the projects I was working on in Tarides during my time there. I was very fortunate and privileged enough to be able to spend quite a lot of my free time hacking on OCaml projects that interested me. One of those was Irmin.

Irmin

Irmin is a key-value store with a twist.

Irmin is based on distributed version-control systems (DVCs), extensively used in software development to enable developers to keep track of change provenance and expose modifications in the source code. Irmin applies DVC's principles to large-scale distributed data and exposes similar functions to Git (clone, push, pull, branch, rebase).

After getting my head around some of the insane functor gymnastics that allows Irmin to make zero assumptions about the concrete type of anything, I developed some toy projects to help understand it better.

I also had the great pleasure to mentor Odinaka Joy (an intern at Tarides and ex-Outreachy mentee) on getting "Irmin in the browser" with irmin-server. We both learnt a lot from this project.

Hacking on Irmin lead me to the "content-addressed" world and projects like IPFS. I like the design of some of their core protocols and formats emphasising the importance of self-description. I implemented and released a few OCaml libraries for them including multihash, multibase and cid.

Mirage

Mirage is a framework for building "systems".

MirageOS is a library operating system that constructs unikernels for secure, high-performance network applications across a variety of cloud computing and mobile platforms.

My experience with Mirage might be better summed up as

Reifying Internet Engineering Task Force RFCs in Pure OCaml

The high-level tools like mirage help construct the unikernel, but I was always more interested in using mirage as a conduit (that is a very niche joke, if you know you know) for learning how the internet works. To this end I decided to embark on the not so small task of "implementing WebRTC" in pure OCaml...

No surprises that I didn't come close, but I learnt a lot on the journey. For example, here are some partial implementations of an assortment of protocols:

Conclusion

I could go on and on. We haven't even discussed geocaml or watch.ocaml.org. I procrastinated for about half year to write this post. I'm glad that I finally got around to doing it, even if just selfishly it was fun and insightful to look back at how far I've come as an OCaml developer.

The biggest takeaway whilst writing this was the number of thank yous. My experience in OCaml was primarily driven by the people I interacted and collaborated with. I've probably missed some people, apologies.

As for what's next, I'll be starting a PhD at the intersection of Computer Science, Ecology and Conservation in Cambridge this October. I'm looking forward to focusing on other things I'm passionate about like how technology and climate change impact human rights. I'm sure there will be plenty of OCaml programming too...

If you made it this far, thank you.