Building an OceanBase-based Distributed Nearly Real-time Analytical Processing Database System

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes OceanBase Mercury, a distributed near real-time analytical processing system built upon OceanBase, designed to deliver enterprise-grade analytical capabilities—including multi-tenancy, high availability, and elastic scalability—for petabyte-scale data. Traditional OLAP systems struggle to simultaneously support real-time transactions and efficient analytics at scale, often suffering from high data redundancy, complex cross-system synchronization, and poor timeliness. Mercury addresses these challenges through three key innovations: an adaptive columnar storage format with hybrid layout optimization, a materialized view differential refresh mechanism that ensures temporal consistency, and a polymorphic vectorized execution engine compatible with three distinct data formats. Experimental results on real-world workloads demonstrate that Mercury achieves 1.3–3.1× faster query latency than specialized OLAP engines while maintaining sub-second response times, effectively balancing analytical depth with operational agility.

Technology Category

Application Category

📝 Abstract
The growing demand for database systems capable of efficiently managing massive datasets while delivering real-time transaction processing and advanced analytical capabilities has become critical in modern data infrastructure. While traditional OLAP systems often fail to meet these dual requirements, emerging real-time analytical processing systems still face persistent challenges, such as excessive data redundancy, complex cross-system synchronization, and suboptimal temporal efficiency. This paper introduces OceanBase Mercury as an innovative OLAP system designed for petabyte-scale data. The system features a distributed, multi-tenant architecture that ensures essential enterprise-grade requirements, including continuous availability and elastic scalability. Our technical contributions include three key components: (1) an adaptive columnar storage format with hybrid data layout optimization, (2) a differential refresh mechanism for materialized views with temporal consistency guarantees, and (3) a polymorphic vectorization engine supporting three distinct data formats. Empirical evaluations under real-world workloads demonstrate that OceanBase Mercury outperforms specialized OLAP engines by 1.3X to 3.1X speedup in query latency while maintaining sub-second latency, positioning it as a groundbreaking AP solution that effectively balances analytical depth with operational agility in big data environments.
Problem

Research questions and friction points this paper is trying to address.

real-time analytical processing
OLAP
data redundancy
cross-system synchronization
temporal efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive columnar storage
differential refresh
polymorphic vectorization
materialized views
distributed OLAP
🔎 Similar Papers
No similar papers found.
Quanqing Xu
Quanqing Xu
Ant Group
Cloud ComputingCloud StorageLarge-scale Hybrid Storage Systems
C
Chuanhui Yang
OceanBase, Ant Group
Ruijie Li
Ruijie Li
MPhil, Hong Kong University of Science and Technology (Guangzhou)
LLMMultimodalGraph Learning
D
Dongdong Xie
OceanBase, Ant Group
H
Hui Cao
OceanBase, Ant Group
Y
Yi Xiao
OceanBase, Ant Group
J
Junquan Chen
OceanBase, Ant Group
Y
Yanzuo Wang
OceanBase, Ant Group
S
Saitong Zhao
OceanBase, Ant Group
F
Fusheng Han
OceanBase, Ant Group
B
Bin Liu
OceanBase, Ant Group
G
Guoping Wang
OceanBase, Ant Group
Y
Yuzhong Zhao
OceanBase, Ant Group
M
Mingqiang Zhuang
OceanBase, Ant Group