ARCADE: A Real-Time Data System for Hybrid and Continuous Query Processing across Diverse Data Modalities

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing systems struggle to efficiently support real-time ingestion and hybrid continuous querying over multimodal data—including text, images, video, spatial, and relational modalities—particularly under high-throughput writes and complex semantic retrieval. This paper introduces the first unified real-time analytical system for multimodal data, featuring three key innovations: (1) an LSM-tree–based unified on-disk secondary index enabling joint indexing across heterogeneous modalities; (2) a cost-aware query optimizer tailored for mixed workloads; and (3) an incremental materialized view framework to accelerate persistent queries. The system integrates RocksDB for storage and MySQL for query processing, incorporating vector, spatial, and full-text indexing alongside incremental computation. Experiments demonstrate up to 7.4× and 1.4× higher throughput over state-of-the-art multimodal systems in read- and write-intensive scenarios, respectively, significantly improving real-time responsiveness and scalability.

Technology Category

Application Category

📝 Abstract
The explosive growth of multimodal data - spanning text, image, video, spatial, and relational modalities, coupled with the need for real-time semantic search and retrieval over these data - has outpaced the capabilities of existing multimodal and real-time database systems, which either lack efficient ingestion and continuous query capability, or fall short in supporting expressive hybrid analytics. We introduce ARCADE, a real-time data system that efficiently supports high-throughput ingestion and expressive hybrid and continuous query processing across diverse data types. ARCADE introduces unified disk-based secondary index on LSM-based storage for vector, spatial, and text data modalities, a comprehensive cost-based query optimizer for hybrid queries, and an incremental materialized view framework for efficient continuous queries. Built on open-source RocksDB storage and MySQL query engine, ARCADE outperforms leading multimodal data systems by up to 7.4x on read-heavy and 1.4x on write-heavy workloads.
Problem

Research questions and friction points this paper is trying to address.

Addressing inefficient ingestion and continuous query processing for multimodal data
Supporting expressive hybrid analytics across diverse data modalities
Overcoming limitations of existing multimodal and real-time database systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified disk-based secondary index for multiple data modalities
Cost-based query optimizer for expressive hybrid analytics
Incremental materialized view framework for continuous queries
🔎 Similar Papers
No similar papers found.
Jingyi Yang
Jingyi Yang
University of Science and Technology of China
Computer VisionDeep LearningAI AgentGenerative ModelsReinforcement Learning
S
Songsong Mo
Nanyang Technological University, Singapore
J
Jiachen Shi
Nanyang Technological University, Singapore
Zihao Yu
Zihao Yu
University of Science and Technology of China
K
Kunhao Shi
Nanyang Technological University, Singapore
X
Xuchen Ding
Nanyang Technological University, Singapore
Gao Cong
Gao Cong
Nanyang Technological University
Data ManagementDatabasesData MiningSpatial Databases