Final year undergraduate dissertations and MPhil theses.
The design and implementation of a jq-like EDSL allowing programmers to abstract away the format in which data is stored (YAML, EXIF metadata etc.) and the file system being used – e.g.
open("a/b/[c,d]")
could mean "fields c, d in YAML file b of ZIP file a", or it could mean "fields b.c, b.d in JSON file a". My project would statically examine the filesystem to decide which case we're in, giving data scientists an easy way to declare the shape of the data they depend on. Success criteria include projection (correctly reading the data based on the file type), subtyping (e.g. JSON-LD is a child of JSON) and perhaps some compiler optimisations related to parsing (e.g. only loading a JSON library if the file is actually JSON).
Algebraic effects are becoming increasingly common, but it can be difficult to use them in parallel computations. Xie et al. (2024) present a new structure for effect handlers which allows parallel subcomputations to perform effects independently without synchronisation. I plan to implement the traverse handler in OCaml 5 and evaluate its performance against equivalent programs which do not use effects using a variety of benchmark programs. As extensions, I will devise similar handlers for other types of parallelism, possibly including a more general form which works for many types simultaneously, and investigate how to improve its real-world performance.
Co-supervisor: Michael Dales
Geospatial processing is critical in environmental science and is used in a range of environmental scenarios, both in areas of carbon accounting like digital Monitoring, Reporting and Verification in Reforestation Carbon Removal, and also biodiversity analysis like calculating worldwide extinction rates.
However there are problems with the current state of affairs, Python is one of the most popular data science libraries, and when doing geospatial data science with Python you use the GDAL geospatial data library. This workflow is not ideal [2]. For environmental scientists, interacting with the geospatial data in the imperative way that Python requires can be very difficult, and it is also not designed for the scale of computation needed for large geospatial projects. Furthermore, Python’s dynamic type system means that tricky type errors due to heterogeneity of data points can go unchecked. GDAL also only allows synchronous IO for loading GeoTIFF files, which can hurt performance.
I will write a non-blocking library for reading GeoTIFF files in OCaml (built on top of the WIP OCaml-TIFF library1 written by one of my supervisors, Patrick Ferris). I will then use OCaml to write an embedded Domain Specific Language (eDSL) which will be used to interact with geospatial data in a declarative way.
This project will add some features to the Hazel language [1]. Hazel is a functional research language that makes use of gradual types to support unusual features such as: holes (code place-holders) to give type meaning to incomplete programs. Importantly for this project, all Hazel programs, even ill-typed or incomplete programs, are evaluable. This allows dynamic reasoning about ill-typed programs via evaluation traces with the potential to improve the user’s understanding of why ill-typed programs go wrong.
This project aims to exploit further this potential by providing some extra features to both: aid with finding values/inputs that demonstrate why type-errors were found (type-error witnesses) and linking the evaluation traces back to source code.
Coming soon...
Student: Michał Mgeładze-Arciuch
See Anil Madhavapeddy's project description on his website
Anil Madhavapeddy and I share quite a few projects that we co-supervise.