SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

221K/year
🤖 AI Summary
This work addresses the poor memory locality and limited parallelizability of recursive algebraic data types (ADTs) under the conventional Array-of-Structures (AoS) memory layout, where fields are interleaved. To overcome these limitations, the paper introduces the first extension of the Structure-of-Arrays (SoA) paradigm to recursive ADTs, proposing a multi-buffer factorized layout that separates storage by field to enhance memory access efficiency. The authors design SoCal, a functional programming language, together with its compiler Colobus, which automatically transforms programs to operate on this novel layout. Evaluated on tree-processing benchmarks, the approach achieves a geometric mean speedup of 1.46×, demonstrating significant improvements in both performance and scalability.
📝 Abstract
Array-of-structures (AoS) to structure-of-arrays (SoA) is a classic compiler transformation that improves memory locality and enables data-parallel execution. Existing AoS-to-SoA transformations primarily target regular, array-based programs in imperative languages like C and C++. In contrast, many applications manipulate tree-shaped data structures, for example, ASTs in compilers, DOM trees in browsers, and k-d trees in scientific workloads. Prior work improves the performance of functional programs operating on such data by serializing algebraic datatypes (ADTs) into contiguous memory buffers. However, these representations interleave fields within a single buffer, similar to AoS layouts. We introduce factored, multi-buffer layouts that store different ADT fields in separate buffers, enabling SoA-like layouts for serialized recursive data structures. We formalize this approach in SoCal, a language for generating factored ADT representations, and implement it in a compiler called Colobus. Colobus automatically transforms functional programs to operate over a serialized, factored layout of recursive ADTs. Our evaluation shows a 1.46x geometric mean speedup on a suite of tree-processing benchmarks.
Problem

Research questions and friction points this paper is trying to address.

memory-layout factorization
recursive datatypes
structure-of-arrays
algebraic datatypes
tree-shaped data structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

memory-layout factorization
structure-of-arrays
algebraic datatypes
recursive data structures
compiler transformation