🤖 AI Summary
Distributed storage systems commonly enforce strong consistency between data and metadata via sequential writes—data first, then metadata—introducing substantial write latency and throughput bottlenecks. This work decouples metadata updates from the critical I/O path by implementing a lightweight, in-switch metadata cache with asynchronous forwarding in programmable switches, and introduces an in-switch data visibility mechanism that enables clients to observe data as soon as it is durably persisted. The approach requires no modifications to backend storage engines and is compatible with log-structured key-value stores, file systems, and secondary indexing architectures. Experimental evaluation demonstrates up to 52.4% reduction in write latency and up to 126.9% throughput improvement under write-intensive workloads—achieving, for the first time, performance beyond the fundamental limits of traditional ordered-write schemes while preserving strong consistency.
📝 Abstract
Distributed storage systems typically maintain strong consistency between data nodes and metadata nodes by adopting ordered writes: 1) first installing data; 2) then updating metadata to make data visible.We propose SwitchDelta to accelerate ordered writes by moving metadata updates out of the critical path. It buffers in-flight metadata updates in programmable switches to enable data visibility in the network and retain strong consistency. SwitchDelta uses a best-effort data plane design to overcome the resource limitation of switches and designs a novel metadata update protocol to exploit the benefits of in-network data visibility. We evaluate SwitchDelta in three distributed in-memory storage systems: log-structured key-value stores, file systems, and secondary indexes. The evaluation shows that SwitchDelta reduces the latency of write operations by up to 52.4% and boosts the throughput by up to 126.9% under write-heavy workloads.