Raven Update #1

Also posted on X/Twitter

Welcome to the second weekly update for Raven, OCaml's upcoming scientific computing stack.

This week has been a whirlwind of progress. The good news is that I've finished the ongoing rewrite that aimed to simplify the backend interface. With this, we were able to have a working demo of MNIST classification using Rune, our autodiff library.

The bad news is that it is extremely slow. Slow enough that we can't realistically release it in its current state, even for an alpha.

That shifted our focus to performance, which has been the theme of this week.

Performance Work

I added landmarks profiling infrastructure to identify where the slowdown was coming from: the convolution operations are taking orders of magnitude longer than expected.

Initial investigation showed that the issue wasn't the convolution algorithm itself. Instead, the fact that we were using a lot of view operations - reshape, permute, slice - and that some of them created views with strides that required memory-reordering, was forcing us to making several temporary contiguous copies of the tensors to accomodate the view operations.

These multipe intermediate copies are the bottleneck.

I made some initial attempts to optimize the convolution flow by improving how reshapes are handled and got some 10% improvements there, but it became clear that the real solution was an architectural change.

Lazy Views and Symbolic Shapes

To address the performance bottleneck identified above, this week I've been implementing lazy view operations and symbolic shapes in nx.

The core idea is simple: instead of eagerly materializing tensors when view operations create complex stride patterns, we keep track of the sequence of transformations and only materialize when absolutely necessary. Operations like reshape, transpose, and expand now just update metadata tracking how to interpret the underlying data, rather than copying bytes around.

While refactoring our view layers, I've also introduced symbolic shapes, replacing concrete integer arrays with dimensions that can be bound at runtime. This enables shape-polymorphic kernels and dynamic batching - essential features for JIT compilation.

The implementation touches every layer of the stack. I've introduced a new Lazy_view module that tracks sequences of view transformations, updated the tensor type to use this instead of a single view, and modified both Native and Metal backends to handle these lazy views appropriately.

The implementation is in place, but many tests are currently broken. I'm working through these systematically before we can benchmark the new implementation to measure the impact it had on performance.

Structured Error Reporting

We've completely revamped error handling in nx with a new structured error reporting system. Gone are generic exception messages - now every error provides context about what went wrong, why it failed, and often hints about how to fix it.

This might seem like housekeeping, but I think it will have a nice impact on developer experience. When your reshape fails, you'll now see exactly which dimensions are incompatible and get a suggestion to call contiguous() first. These quality-of-life improvements add up.

I've revamped error handling in nx with a new structured error reporting system. Now every error provides context about what went wrong, why it failed, and often hints about how to fix it. Errors now follow a consistent format:

reshape: cannot reshape [10,10] to [12,10] (100→120 elements)
hint: check your target shape dimensions

This might seem like housekeeping, but who knows, maybe Raven will get to be known for the quality of its error messages.

Website Launch

The Raven website is now live at https://raven-ml.dev. I'm quite happy with where it landed. It features a minimalist design with landing pages for each library and the beginnings of our documentation. While still sparse, it gives us a foundation to build on as we head towards the first release.

The next step will be to integrate the odoc-generated documentation.

On Thursday, I was feeling creative and wrote an introduction page, have a look to read more about the motivation behind Raven.

What's Next

The immediate focus is fixing the remaining test failures from the lazy view refactor and ensuring all dependent libraries still work correctly. Once that's stable, we'll showcase an MNIST demo using Rune with the newly optimized convolution operations. Crossing fingers that the performance will be acceptable.

Once we get that demo up and running, I will go back to focusing on the Quill editor to fix all of the major issues, and we should be go for a release!

Get Involved

I'd love to hear your thoughts on Raven's direction. What features would make OCaml a viable option for your use case? What's holding you back from using OCaml for numerical work today?

Want to contribute? The codebase is still evolving rapidly, but I'm happy to guide anyone interested in helping. Even just trying out the examples and reporting what doesn't work as expected would be incredibly valuable at this stage.

Please reach out on GitHub or drop me an email!