Beyond Relational: Semantic-Aware Multi-Modal Analytics with LLM-Native Query Optimization

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional relational query operators lack sufficient semantic expressiveness, hindering the practical deployment of multimodal analytics in domains such as e-commerce and healthcare. This paper proposes Nirvana, a novel multimodal data analysis framework that integrates semantic-aware operators powered by large language models (LLMs) as semantic engines. Its key contributions are: (1) a collaborative optimizer design comprising an agent-based logical optimizer and a cost-aware physical optimizer; (2) an LLM-backend adaptive selection mechanism grounded in an improved fractional quantification metric; and (3) a joint logical–physical optimization strategy combining natural-language transformation rules, random-walk search, computation reuse, predicate evaluation pushdown, and model capability assumptions. Evaluated on three real-world benchmarks, Nirvana reduces end-to-end execution time by 10%–85% and cuts system cost by 76% on average—substantially outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Multi-modal analytical processing has the potential to transform applications in e-commerce, healthcare, entertainment, and beyond. However, real-world adoption remains elusive due to the limited ability of traditional relational query operators to capture query semantics. The emergence of foundation models, particularly the large language models (LLMs), opens up new opportunities to develop flexible, semantic-aware data analytics systems that transcend the relational paradigm. We present Nirvana, a multi-modal data analytics framework that incorporates programmable semantic operators while leveraging both logical and physical query optimization strategies, tailored for LLM-driven semantic query processing. Nirvana addresses two key challenges. First, it features an agentic logical optimizer that uses natural language-specified transformation rules and random-walk-based search to explore vast spaces of semantically equivalent query plans -- far beyond the capabilities of conventional optimizers. Second, it introduces a cost-aware physical optimizer that selects the most effective LLM backend for each operator using a novel improvement-score metric. To further enhance efficiency, Nirvana incorporates computation reuse and evaluation pushdown techniques guided by model capability hypotheses. Experimental evaluations on three real-world benchmarks demonstrate that Nirvana is able to reduce end-to-end runtime by 10%--85% and reduces system processing costs by 76% on average, outperforming state-of-the-art systems at both efficiency and scalability.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of relational operators in semantic query processing
Optimizing multi-modal analytics with LLM-native logical and physical strategies
Reducing runtime and cost in semantic-aware data analytics systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic logical optimizer using natural language transformation rules
Cost-aware physical optimizer with improvement-score metric
Computation reuse and evaluation pushdown for efficiency
🔎 Similar Papers
No similar papers found.