Towards a Multimodal Stream Processing System

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of processing multimodal streaming queries under strict latency and high-throughput constraints, this paper introduces the first end-to-end stream processing system centered on multimodal large language models (MMLMs) as core operators. We propose a novel three-layer joint optimization framework—spanning logical, physical, and semantic levels—that integrates query rewriting, cross-modal operator pushdown, and semantic compression to significantly reduce inference overhead while preserving model accuracy. Our prototype system, system{}, demonstrates over 10× higher throughput and one-order-of-magnitude lower end-to-end latency compared to state-of-the-art approaches. This work pioneers the deep integration of MMLMs into stream processing architectures, establishing a systematic foundation for scalable, low-latency multimodal real-time analytics and charting a new research paradigm for multimodal stream systems.

Technology Category

Application Category

📝 Abstract
In this paper, we present a vision for a new generation of multimodal streaming systems that embed MLLMs as first-class operators, enabling real-time query processing across multiple modalities. Achieving this is non-trivial: while recent work has integrated MLLMs into databases for multimodal queries, streaming systems require fundamentally different approaches due to their strict latency and throughput requirements. Our approach proposes novel optimizations at all levels, including logical, physical, and semantic query transformations that reduce model load to improve throughput while preserving accuracy. We demonstrate this with system{}, a prototype leveraging such optimizations to improve performance by more than an order of magnitude. Moreover, we discuss a research roadmap that outlines open research challenges for building a scalable and efficient multimodal stream processing systems.
Problem

Research questions and friction points this paper is trying to address.

Developing multimodal streaming systems with MLLM operators
Optimizing latency and throughput for real-time processing
Proposing transformations to reduce model load while maintaining accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embeds MLLMs as first-class operators
Applies logical, physical, semantic query optimizations
Reduces model load to significantly improve throughput
🔎 Similar Papers
No similar papers found.