🤖 AI Summary
This paper addresses the longstanding challenges in video restoration—namely, heavy reliance on degradation priors, poor interpretability, and limited generalizability—by proposing the first training- and test-time degradation-label-free unified framework. Methodologically, it introduces a novel degradation-aware natural language grounding mechanism, leveraging foundation models to achieve semantic disentanglement and learn degradation-aware prompts; during inference, it enables zero-overhead multi-task restoration. Key contributions include: (1) the first natural language-guided video restoration framework that operates without degradation annotations; (2) the construction of the first multi-degradation benchmark dataset featuring time-varying composite degradations—including 3D/4D distortions and snow-weather scenarios; and (3) state-of-the-art performance across multiple new benchmarks, demonstrating significantly enhanced robustness and generalization against complex, dynamic degradations.
📝 Abstract
In this work, we propose an all-in-one video restoration framework that grounds degradation-aware semantic context of video frames in natural language via foundation models, offering interpretable and flexible guidance. Unlike prior art, our method assumes no degradation knowledge in train or test time and learns an approximation to the grounded knowledge such that the foundation model can be safely disentangled during inference adding no extra cost. Further, we call for standardization of benchmarks in all-in-one video restoration, and propose two benchmarks in multi-degradation setting, three-task (3D) and four-task (4D), and two time-varying composite degradation benchmarks; one of the latter being our proposed dataset with varying snow intensity, simulating how weather degradations affect videos naturally. We compare our method with prior works and report state-of-the-art performance on all benchmarks.