Make a Video Call with LLM: A Measurement Campaign over Five Mainstream Apps

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research lacks systematic evaluation of AI-powered video chat systems. This paper introduces the first four-dimensional benchmark for AI video calling—encompassing quality, latency, internal mechanisms, and system overhead—built upon a customized testbed and real-time audio-video analysis. We conduct empirical evaluations across five mainstream systems. Our benchmark enables multi-granularity quantitative assessment, uncovering critical bottlenecks in computational scheduling, cross-modal coordination, and resource contention, while establishing realistic performance baselines. Key contributions include: (1) an open-source, reproducible comprehensive evaluation framework; (2) identification of core optimization directions that most significantly impact end-to-end user experience; and (3) empirically grounded insights to guide architectural design and algorithmic optimization for AI video communication systems.

Technology Category

Application Category

📝 Abstract
In 2025, Large Language Model (LLM) services have launched a new feature -- AI video chat -- allowing users to interact with AI agents via real-time video communication (RTC), just like chatting with real people. Despite its significance, no systematic study has characterized the performance of existing AI video chat systems. To address this gap, this paper proposes a comprehensive benchmark with carefully designed metrics across four dimensions: quality, latency, internal mechanisms, and system overhead. Using custom testbeds, we further evaluate five mainstream AI video chatbots with this benchmark. This work provides the research community a baseline of real-world performance and identifies unique system bottlenecks. In the meantime, our benchmarking results also open up several research questions for future optimizations of AI video chatbots.
Problem

Research questions and friction points this paper is trying to address.

Characterizing performance of AI video chat systems
Evaluating five mainstream AI video chatbots comprehensively
Identifying system bottlenecks and future research questions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes benchmark with four-dimensional metrics
Evaluates five AI video chatbots using testbeds
Identifies system bottlenecks and research questions
🔎 Similar Papers
No similar papers found.