MoA-VR: A Mixture-of-Agents System Towards All-in-One Video Restoration

📅 2025-10-09
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Real-world videos often suffer from complex, heterogeneous degradations—including noise, compression artifacts, and low-light distortions—posing significant challenges for generalization across diverse and compound degradation types. To address this, we propose the first hybrid agent system for video restoration, inspired by human expert collaboration, comprising three synergistic modules: degradation identification, adaptive routing-based restoration, and quality assessment. Our approach introduces a novel multi-agent architecture featuring a learnable routing mechanism that integrates vision-language models with large language models. Furthermore, we develop Res-VQ, the first restoration-oriented video quality assessment model, along with its dedicated benchmark dataset, Res-VQ-Bench. Extensive experiments demonstrate that our method achieves substantial improvements over state-of-the-art methods in both objective metrics and perceptual quality, particularly under challenging, composite degradation scenarios.

Technology Category

Application Category

📝 Abstract
Real-world videos often suffer from complex degradations, such as noise, compression artifacts, and low-light distortions, due to diverse acquisition and transmission conditions. Existing restoration methods typically require professional manual selection of specialized models or rely on monolithic architectures that fail to generalize across varying degradations. Inspired by expert experience, we propose MoA-VR, the first underline{M}ixture-underline{o}f-underline{A}gents underline{V}ideo underline{R}estoration system that mimics the reasoning and processing procedures of human professionals through three coordinated agents: Degradation Identification, Routing and Restoration, and Restoration Quality Assessment. Specifically, we construct a large-scale and high-resolution video degradation recognition benchmark and build a vision-language model (VLM) driven degradation identifier. We further introduce a self-adaptive router powered by large language models (LLMs), which autonomously learns effective restoration strategies by observing tool usage patterns. To assess intermediate and final processed video quality, we construct the underline{Res}tored underline{V}ideo underline{Q}uality (Res-VQ) dataset and design a dedicated VLM-based video quality assessment (VQA) model tailored for restoration tasks. Extensive experiments demonstrate that MoA-VR effectively handles diverse and compound degradations, consistently outperforming existing baselines in terms of both objective metrics and perceptual quality. These results highlight the potential of integrating multimodal intelligence and modular reasoning in general-purpose video restoration systems.
Problem

Research questions and friction points this paper is trying to address.

Automatically handles diverse video degradations like noise and artifacts
Replaces manual model selection with adaptive multi-agent reasoning
Overcomes limitations of monolithic architectures for video restoration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Agents system for video restoration
Vision-language model identifies video degradations
LLM-based router autonomously learns restoration strategies
L
Lu Liu
Shanghai Jiao Tong University, Shanghai, 200240, China
Chunlei Cai
Chunlei Cai
Bilibili Inc.
Video compressionImage compressionImage processingDeep learning
Shaocheng Shen
Shaocheng Shen
Shanghai Jiaotong university
Computer VisionGenerative model
J
Jianfeng Liang
Shanghai Jiao Tong University, Shanghai, 200240, China
W
Weimin Ouyang
University of Electronic Science and Technology of China, Hainan, 572400, China
Tianxiao Ye
Tianxiao Ye
Bilibili Inc., Shanghai, 200433, China
Jian Mao
Jian Mao
Beihang University
Huiyu Duan
Huiyu Duan
Shanghai Jiao Tong University
Multimedia Signal Processing
Jiangchao Yao
Jiangchao Yao
Shanghai Jiao Tong University
Machine Learning
X
Xiaoyun Zhang
Shanghai Jiao Tong University, Shanghai, 200240, China
Q
Qiang Hu
Shanghai Jiao Tong University, Shanghai, 200240, China
Guangtao Zhai
Guangtao Zhai
Professor, IEEE Fellow, Shanghai Jiao Tong University
Multimedia Signal ProcessingVisual Quality AssessmentQoEAI EvaluationDisplays