Building an OceanBase-based Distributed Nearly Real-time Analytical Processing Database System

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

207K/year
🤖 AI Summary
This work proposes OceanBase Mercury, a distributed near real-time analytical processing system built upon OceanBase, designed to deliver enterprise-grade analytical capabilities—including multi-tenancy, high availability, and elastic scalability—for petabyte-scale data. Traditional OLAP systems struggle to simultaneously support real-time transactions and efficient analytics at scale, often suffering from high data redundancy, complex cross-system synchronization, and poor timeliness. Mercury addresses these challenges through three key innovations: an adaptive columnar storage format with hybrid layout optimization, a materialized view differential refresh mechanism that ensures temporal consistency, and a polymorphic vectorized execution engine compatible with three distinct data formats. Experimental results on real-world workloads demonstrate that Mercury achieves 1.3–3.1× faster query latency than specialized OLAP engines while maintaining sub-second response times, effectively balancing analytical depth with operational agility.

Technology Category

Application Category

📝 Abstract
The growing demand for database systems capable of efficiently managing massive datasets while delivering real-time transaction processing and advanced analytical capabilities has become critical in modern data infrastructure. While traditional OLAP systems often fail to meet these dual requirements, emerging real-time analytical processing systems still face persistent challenges, such as excessive data redundancy, complex cross-system synchronization, and suboptimal temporal efficiency. This paper introduces OceanBase Mercury as an innovative OLAP system designed for petabyte-scale data. The system features a distributed, multi-tenant architecture that ensures essential enterprise-grade requirements, including continuous availability and elastic scalability. Our technical contributions include three key components: (1) an adaptive columnar storage format with hybrid data layout optimization, (2) a differential refresh mechanism for materialized views with temporal consistency guarantees, and (3) a polymorphic vectorization engine supporting three distinct data formats. Empirical evaluations under real-world workloads demonstrate that OceanBase Mercury outperforms specialized OLAP engines by 1.3X to 3.1X speedup in query latency while maintaining sub-second latency, positioning it as a groundbreaking AP solution that effectively balances analytical depth with operational agility in big data environments.
Problem

Research questions and friction points this paper is trying to address.

real-time analytical processing
OLAP
data redundancy
cross-system synchronization
temporal efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive columnar storage
differential refresh
polymorphic vectorization
materialized views
distributed OLAP
Quanqing Xu
Quanqing Xu
Ant Group
Cloud ComputingCloud StorageLarge-scale Hybrid Storage Systems
C
Chuanhui Yang
OceanBase, Ant Group
Ruijie Li
Ruijie Li
MPhil, Hong Kong University of Science and Technology (Guangzhou)
LLMMultimodalGraph Learning
D
Dongdong Xie
OceanBase, Ant Group
H
Hui Cao
OceanBase, Ant Group
Y
Yi Xiao
OceanBase, Ant Group
J
Junquan Chen
OceanBase, Ant Group
Y
Yanzuo Wang
OceanBase, Ant Group
S
Saitong Zhao
OceanBase, Ant Group
F
Fusheng Han
OceanBase, Ant Group
B
Bin Liu
OceanBase, Ant Group
G
Guoping Wang
OceanBase, Ant Group
Y
Yuzhong Zhao
OceanBase, Ant Group
M
Mingqiang Zhuang
OceanBase, Ant Group