home > posts

Reviving OCaml R-Tree

2023-08-09

ocaml • geospatial


Antonin Guttman first proposed R-Trees in 1984.

In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations...

There are lots of excellent resources for understanding more about R-Trees and all of the wonderful variations there of (e.g. R*-trees). This blog post seems to be the most popular introduction.

An OCaml R-Tree Implementation

In the Energy and Environment Group we are working on various geospatial tools, one of which is written in OCaml and needed a geospatial index to speedup spatial queries over a dataset. Searching Github revealed an implementation in pure OCaml. After reaching out to the author, they were more than happy for their work to be maintained in the Geocaml organisation, updated to the latest OCaml tools and hopefully released on opam.

The dunification went smoothly whilst fast-forwarding 15 OCaml compiler versions with no changes or problems (just dune unused variable warnings).

Quick Example

An R-Tree in OCaml is created from an envelope (the kind of bounding box) and a value (for storing in the tree). The value depends on the envelope as you have to provide a function to calculate an envelope from a value. Here is one way we might define a line using the provided two-dimensional Rectangle envelope.

module Line = struct
  type t =
    { p0 : float * float; p1 : float * float }

  let t =
    let open Repr in
    record "line" (fun p0 p1 -> { p0; p1 })
    |+ field "p0" (pair float float) (fun t -> t.p0)
    |+ field "p1" (pair float float) (fun t -> t.p1)
    |> sealr

  type envelope = Rtree.Rectangle.t

  let envelope { p0 = (x1, y1); p1 = (x2, y2) } =
    let x0 = Float.min x1 x2 in
    let x1 = Float.max x1 x2 in
    let y0 = Float.min y1 y2 in
    let y1 = Float.max y1 y2 in
    Rtree.Rectangle.v ~x0 ~y0 ~x1 ~y1
end

In addition to the envelope function, we also need to provide a runtime representation using Repr. In the example, we've constructed it by hand but I recommend using ppx_repr to derive it for you. From here we can create our index.

module R = Rtree.Make (Rtree.Rectangle) (Line)

Inserting Values

There are two ways to insert values into the index -- either by inserting them one-by-one or by bulk loading them. The latter is the recommended approach where possible as the resulting layout of the R-Tree is much better.

# let r =
    let i = R.empty 4 in
    R.insert i Line.{ p0 = (1., 1.); p1 = (2., 2.) };;
val r : R.t = <abstr>

As you might expect, bulk loading takes a list of values instead.

# let r =
    let lines =
        Line.[
        { p0 = (0., 0.); p1 = (1., 1.) };
        { p0 = (1., 1.); p1 = (2., 2.) };
        { p0 = (2., 2.); p1 = (3., 3.) };
        { p0 = (3., 3.); p1 = (4., 4.) };
        ]
    in
    R.load ~max_node_load:4 lines;;
val r : R.t = <abstr>

Spatial Queries

The lines we loaded into the last index are really just four segments of the line y = x. Let's spatially query for the first two segments.

# R.find r (Rtree.Rectangle.v ~x0:0. ~y0:0. ~x1:2. ~y1:2.);;
- : Line.t list =
[{Line.p0 = (0., 0.); p1 = (1., 1.)}; {Line.p0 = (1., 1.); p1 = (2., 2.)};
 {Line.p0 = (2., 2.); p1 = (3., 3.)}]

You can see the query actually returned the first three, that's because all bound checks are inclusive.

Visualising with Vg

The source repository includes a fairly simple example of rendering an R-Tree using Vg. This can be done outside of the repository using the tree representation.

# #show R.tree;;
val tree : R.t -> R.tree
type nonrec tree =
  R.tree =
    Node of (Line.envelope * R.tree) list
  | Leaf of (Line.envelope * Line.t) list
  | Empty

Here is an example rendering from the repository showing some random lines and the envelopes of the R-tree.