TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File System

📅 2025-04-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high update latency and low throughput in erasure-coded storage systems, this paper proposes a two-phase logged-update mechanism: lightweight data logging during the synchronous phase and real-time garbage collection leveraging spatiotemporal locality during the asynchronous phase. We innovatively design a three-tier logging structure and an SSD-lifetime-aware sequential I/O scheduler, converting random writes into sequential writes to significantly reduce update overhead. This is the first work to decouple and optimize the erasure-code update path. Evaluated on real-world traces from Alibaba Cloud and Tencent Cloud, our approach improves update throughput by 7.6× and 5.0×, respectively, while reducing SSD read, write, and erase counts—extending SSD lifetime by up to 13×.

Technology Category

Application Category

📝 Abstract
Compared to replication-based storage systems, erasure-coded storage incurs significantly higher overhead during data updates. To address this issue, various parity logging methods have been pro- posed. Nevertheless, due to the long update path and substantial amount of random I/O involved in erasure code update processes, the resulting long latency and low throughput often fail to meet the requirements of high performance applications. To this end, we propose a two-stage data update method called TSUE. TSUE divides the update process into a synchronous stage that records updates in a data log, and an asynchronous stage that recycles the log in real-time. TSUE effectively reduces update latency by transforming random I/O into sequential I/O, and it significantly reduces recycle overhead by utilizing a three-layer log and the spatio-temporal locality of access patterns. In SSDs cluster, TSUE significantly im- proves update performance, achieving improvements of 7.6X under Ali-Cloud trace, 5X under Ten-Cloud trace, while it also extends the SSD's lifespan by up to 13X through reducing the frequencies of reads/writes and of erase operations.
Problem

Research questions and friction points this paper is trying to address.

Reduces high overhead in erasure-coded storage updates
Transforms random I/O into sequential I/O for lower latency
Improves update performance and extends SSD lifespan significantly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage update method reduces latency
Transforms random I/O into sequential I/O
Three-layer log leverages access locality
🔎 Similar Papers
No similar papers found.
Z
Zheng Wei
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Jing Xing
Jing Xing
Lingang Laboratory
Drug Discovery Data Mining
Y
Yida Gu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Wenjing Huang
Wenjing Huang
RAND Corporation
PsychometricsStructural Equation ModelingItem Response TheoryCyber Security
G
Guangming Tan
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Dingwen Tao
Dingwen Tao
Chinese Academy of Sciences, IEEE/ACM Senior Member
High Performance ComputingData ReductionDeep LearningSystems for MLGPU