CHASE: A Native Relational Database for Hybrid Queries on Structured and Unstructured Data

📅 2025-01-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing databases struggle to efficiently support hybrid queries over structured and unstructured (e.g., vector) data, resulting in poor performance for joint semantic retrieval and SQL execution. This paper introduces the first full-stack native hybrid query engine. Our approach addresses this challenge through three core innovations: (1) semantic-aware query classification and dynamic physical plan optimization; (2) customized physical operators that eliminate redundant computation; and (3) a JIT-compilation-based execution framework tailored for vector–relational hybrid workloads, integrating approximate nearest-neighbor indexing with a unified hybrid query optimizer. Evaluated on real-world datasets, our engine achieves end-to-end query speedups of 13%–7500× over state-of-the-art systems. These gains significantly enhance efficiency in multimodal recommendation and analytical scenarios requiring tight coupling of semantic and relational operations.

Technology Category

Application Category

📝 Abstract
Querying both structured and unstructured data has become a new paradigm in data analytics and recommendation. With unstructured data, such as text and videos, are converted to high-dimensional vectors and queried with approximate nearest neighbor search (ANNS). State-of-the-art database systems implement vector search as a plugin in the relational query engine, which tries to utilize the ANN index to enhance performance. After investigating a broad range of hybrid queries, we find that such designs may miss potential optimization opportunities and achieve suboptimal performance for certain queries. In this paper, we propose CHASE, a query engine that is natively designed to support efficient hybrid queries on structured and unstructured data. CHASE performs specific designs and optimizations on multiple stages in query processing. First, semantic analysis is performed to categorize queries and optimize query plans dynamically. Second, new physical operators are implemented to avoid redundant computations, which is the case with existing operators. Third, compilation-based techniques are adopted for efficient machine code generation. Extensive evaluations using real-world datasets demonstrate that CHASE achieves substantial performance improvements, with speedups ranging from 13% to an extraordinary 7500 times compared to existing systems. These results highlight CHASE's potential as a robust solution for executing hybrid queries.
Problem

Research questions and friction points this paper is trying to address.

Database Systems
Mixed Regular and Irregular Data
Query Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

CHASE System
Dynamic Query Optimization
Hybrid Data Query Processing
🔎 Similar Papers
No similar papers found.
R
Rui Ma
Fudan University
K
Kai Zhang
Fudan University
Z
Zhenying He
Fudan University
Y
Yinan Jing
Fudan University
X. Sean Wang
X. Sean Wang
School of Computer Science, Fudan University
Database SystemsInformation Security and PrivacyWireless Sensor NetworksStreaming Data Processing Time Series QueriesDat
Z
Zhenqiang Chen
Transwarp