Near Data Processing in Taurus Database

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address high network transfer overhead, excessive CPU utilization in the compute layer, and degraded performance for complex queries in cloud-native databases, this paper introduces the first systematic implementation of Near-Data Processing (NDP) in Huawei Cloud’s Taurus database. It pushes selection, projection, and aggregation operations down to the storage layer and proposes a dynamic execution plan pushdown mechanism—integrating data partition pruning, vectorized execution, and distributed query optimization—to enable efficient data filtering and computational offloading at storage nodes. Experimental evaluation on TPC-H 100GB shows that 18 out of 22 queries achieve significant performance gains: average network traffic is reduced by 63%, and compute-layer CPU time decreases by 50%. Notably, Query Q15 achieves a 98% reduction in data transfer, a 91% decrease in CPU time, and an 80% reduction in end-to-end latency. These results demonstrate the effectiveness and advancement of the approach in improving throughput and conserving system resources.

Technology Category

Application Category

📝 Abstract

Huawei's cloud-native database system GaussDB for MySQL (also known as Taurus) stores data in a separate storage layer consisting of a pool of storage servers. Each server has considerable compute power making it possible to push data reduction operations (selection, projection, and aggregation) close to storage. This paper describes the design and implementation of near data processing (NDP) in Taurus. NDP has several benefits: it reduces the amount of data shipped over the network; frees up CPU capacity in the compute layer; and reduces query run time, thereby enabling higher system throughput. Experiments with the TPCH benchmark (100 GB) showed that 18 out of 22 queries benefited from NDP; data shipped was reduced by 63 percent; and CPU time by 50 percent. On Q15 the impact was even higher: data shipped was reduced by 98 percent; CPU time by 91 percent; and run time by 80 percent.

Problem

Research questions and friction points this paper is trying to address.

Reduces network data transfer via near-data processing

Optimizes CPU usage in compute layer with storage-side operations

Improves query performance and system throughput significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Push data operations close to storage

Reduce network data transfer significantly

Improve query performance and CPU efficiency

🔎 Similar Papers

No similar papers found.