🤖 AI Summary
This work proposes OceanBase Mercury, a distributed near real-time analytical processing system built upon OceanBase, designed to deliver enterprise-grade analytical capabilities—including multi-tenancy, high availability, and elastic scalability—for petabyte-scale data. Traditional OLAP systems struggle to simultaneously support real-time transactions and efficient analytics at scale, often suffering from high data redundancy, complex cross-system synchronization, and poor timeliness. Mercury addresses these challenges through three key innovations: an adaptive columnar storage format with hybrid layout optimization, a materialized view differential refresh mechanism that ensures temporal consistency, and a polymorphic vectorized execution engine compatible with three distinct data formats. Experimental results on real-world workloads demonstrate that Mercury achieves 1.3–3.1× faster query latency than specialized OLAP engines while maintaining sub-second response times, effectively balancing analytical depth with operational agility.
📝 Abstract
The growing demand for database systems capable of efficiently managing massive datasets while delivering real-time transaction processing and advanced analytical capabilities has become critical in modern data infrastructure. While traditional OLAP systems often fail to meet these dual requirements, emerging real-time analytical processing systems still face persistent challenges, such as excessive data redundancy, complex cross-system synchronization, and suboptimal temporal efficiency. This paper introduces OceanBase Mercury as an innovative OLAP system designed for petabyte-scale data. The system features a distributed, multi-tenant architecture that ensures essential enterprise-grade requirements, including continuous availability and elastic scalability. Our technical contributions include three key components: (1) an adaptive columnar storage format with hybrid data layout optimization, (2) a differential refresh mechanism for materialized views with temporal consistency guarantees, and (3) a polymorphic vectorization engine supporting three distinct data formats. Empirical evaluations under real-world workloads demonstrate that OceanBase Mercury outperforms specialized OLAP engines by 1.3X to 3.1X speedup in query latency while maintaining sub-second latency, positioning it as a groundbreaking AP solution that effectively balances analytical depth with operational agility in big data environments.