🤖 AI Summary
Existing systems struggle to efficiently support real-time ingestion and hybrid continuous querying over multimodal data—including text, images, video, spatial, and relational modalities—particularly under high-throughput writes and complex semantic retrieval. This paper introduces the first unified real-time analytical system for multimodal data, featuring three key innovations: (1) an LSM-tree–based unified on-disk secondary index enabling joint indexing across heterogeneous modalities; (2) a cost-aware query optimizer tailored for mixed workloads; and (3) an incremental materialized view framework to accelerate persistent queries. The system integrates RocksDB for storage and MySQL for query processing, incorporating vector, spatial, and full-text indexing alongside incremental computation. Experiments demonstrate up to 7.4× and 1.4× higher throughput over state-of-the-art multimodal systems in read- and write-intensive scenarios, respectively, significantly improving real-time responsiveness and scalability.
📝 Abstract
The explosive growth of multimodal data - spanning text, image, video, spatial, and relational modalities, coupled with the need for real-time semantic search and retrieval over these data - has outpaced the capabilities of existing multimodal and real-time database systems, which either lack efficient ingestion and continuous query capability, or fall short in supporting expressive hybrid analytics. We introduce ARCADE, a real-time data system that efficiently supports high-throughput ingestion and expressive hybrid and continuous query processing across diverse data types. ARCADE introduces unified disk-based secondary index on LSM-based storage for vector, spatial, and text data modalities, a comprehensive cost-based query optimizer for hybrid queries, and an incremental materialized view framework for efficient continuous queries. Built on open-source RocksDB storage and MySQL query engine, ARCADE outperforms leading multimodal data systems by up to 7.4x on read-heavy and 1.4x on write-heavy workloads.