Near Data Processing in Taurus Database

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
To address high network transfer overhead, excessive CPU utilization in the compute layer, and degraded performance for complex queries in cloud-native databases, this paper introduces the first systematic implementation of Near-Data Processing (NDP) in Huawei Cloud’s Taurus database. It pushes selection, projection, and aggregation operations down to the storage layer and proposes a dynamic execution plan pushdown mechanism—integrating data partition pruning, vectorized execution, and distributed query optimization—to enable efficient data filtering and computational offloading at storage nodes. Experimental evaluation on TPC-H 100GB shows that 18 out of 22 queries achieve significant performance gains: average network traffic is reduced by 63%, and compute-layer CPU time decreases by 50%. Notably, Query Q15 achieves a 98% reduction in data transfer, a 91% decrease in CPU time, and an 80% reduction in end-to-end latency. These results demonstrate the effectiveness and advancement of the approach in improving throughput and conserving system resources.

Technology Category

Application Category

📝 Abstract
Huawei's cloud-native database system GaussDB for MySQL (also known as Taurus) stores data in a separate storage layer consisting of a pool of storage servers. Each server has considerable compute power making it possible to push data reduction operations (selection, projection, and aggregation) close to storage. This paper describes the design and implementation of near data processing (NDP) in Taurus. NDP has several benefits: it reduces the amount of data shipped over the network; frees up CPU capacity in the compute layer; and reduces query run time, thereby enabling higher system throughput. Experiments with the TPCH benchmark (100 GB) showed that 18 out of 22 queries benefited from NDP; data shipped was reduced by 63 percent; and CPU time by 50 percent. On Q15 the impact was even higher: data shipped was reduced by 98 percent; CPU time by 91 percent; and run time by 80 percent.
Problem

Research questions and friction points this paper is trying to address.

Reduces network data transfer via near-data processing
Optimizes CPU usage in compute layer with storage-side operations
Improves query performance and system throughput significantly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Push data operations close to storage
Reduce network data transfer significantly
Improve query performance and CPU efficiency
🔎 Similar Papers
No similar papers found.
S
Shu Lin
Huawei Technologies Canada Co., Ltd., Markham, Ontario, Canada
Arunprasad P. Marathe
Arunprasad P. Marathe
PointClickCare
database systemscomputer science
P
Per-Åke Larson
Huawei Technologies Canada Co., Ltd., Markham, Ontario, Canada
C
Chong Chen
Huawei Technologies Canada Co., Ltd., Markham, Ontario, Canada
C
Calvin Sun
Huawei Technologies Canada Co., Ltd., Markham, Ontario, Canada
P
Paul Lee
Huawei Technologies Canada Co., Ltd., Markham, Ontario, Canada
Weidong Yu
Weidong Yu
Sun Yat-Sen University
Ocean observationOcean-Atmosphere InteractionMonsoon Variability