WIO: Upload-Enabled Computational Storage on CXL SSDs

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the memory wall between processors and storage, where data movement has become a critical performance bottleneck. Existing computational storage solutions struggle to scale due to programming complexity, ecosystem fragmentation, and thermal/power constraints. To overcome these limitations, the authors propose a reversible computational storage architecture that enables dynamic migration of WebAssembly-compiled storage executables between the host and CXL SSDs. The design leverages CXL.mem’s cache coherence for seamless state sharing and introduces a zero-copy drain-and-switch protocol to manage thermal and power constraints. An agility-aware scheduler elastically dispatches compute tasks based on runtime conditions. Evaluations on both FPGA prototypes and commercial computational storage devices demonstrate up to 2× higher throughput and 3.75× lower write latency without requiring application modifications, effectively transforming rigid thermal limits into tunable performance trade-offs.

Technology Category

Application Category

📝 Abstract

The widening gap between processor speed and storage latency has made data movement a dominant bottleneck in modern systems. Two lines of storage-layer innovation attempted to close this gap: persistent memory shortened the latency hierarchy, while computational storage devices pushed processing toward the data. Neither has displaced conventional NVMe SSDs at scale, largely due to programming complexity, ecosystem fragmentation, and thermal/power cliffs under sustained load. We argue that storage-side compute should be \emph{reversible}: computation should migrate dynamically between host and device based on runtime conditions. We present \sys, which realizes this principle on CXL SSDs by decomposing I/O-path logic into migratable \emph{storage actors} compiled to WebAssembly. Actors share state through coherent CXL.mem regions; an agility-aware scheduler migrates them via a zero-copy drain-and-switch protocol when thermal or power constraints arise. Our evaluation on an FPGA-based CXL SSD prototype and two production CSDs shows that \sys turns hard thermal cliffs into elastic trade-offs, achieving up to 2$\times$ throughput improvement and 3.75$\times$ write latency reduction without application modification.

Problem

Research questions and friction points this paper is trying to address.

storage latency

computational storage

thermal constraints

data movement bottleneck

CXL SSDs

Innovation

Methods, ideas, or system contributions that make the work stand out.

reversible computation

computational storage

CXL SSD