Artic: AI-oriented Real-time Communication for MLLM Video Assistant

📅 2026-02-13

📈 Citations: 0

✨ Influential: 0

📄 PDF

career value

208K/year

Technology Category

Application Category

📝 Abstract

AI Video Assistant emerges as a new paradigm for Real-time Communication (RTC), where one peer is a Multimodal Large Language Model (MLLM) deployed in the cloud. This makes interaction between humans and AI more intuitive, akin to chatting with a real person. However, a fundamental mismatch exists between current RTC frameworks and AI Video Assistants, stemming from the drastic shift in Quality of Experience (QoE) and more challenging networks. Measurements on our production prototype also confirm that current RTC fails, causing latency spikes and accuracy drops. To address these challenges, we propose Artic, an AI-oriented RTC framework for MLLM Video Assistants, exploring the shift from"humans watching video"to"AI understanding video."Specifically, Artic proposes: (1) Response Capability-aware Adaptive Bitrate, which utilizes MLLM accuracy saturation to proactively cap bitrate, reserving bandwidth headroom to absorb future fluctuations for latency reduction; (2) Zero-overhead Context-aware Streaming, which allocates limited bitrate to regions most important for the response, maintaining accuracy even under ultra-low bitrates; and (3) Degraded Video Understanding Benchmark, the first benchmark evaluating how RTC-induced video degradation affects MLLM accuracy. Prototype experiments using real-world uplink traces show that compared with existing methods, Artic significantly improves accuracy by 15.12% and reduces latency by 135.31 ms. We will release the benchmark and codes at https://github.com/pku-netvideo/DeViBench.

Problem

Research questions and friction points this paper is trying to address.

Real-time Communication

Multimodal Large Language Model

Video Assistant

Quality of Experience

Latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-oriented RTC

Multimodal Large Language Model

Adaptive Bitrate Control