Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the challenge of video bitstream distortion without predefined corruption masks by proposing the first blind video restoration method. Leveraging intrinsic video metadata—such as motion vectors and frame types—the approach introduces a metadata-guided dual-stream encoder and a prior-driven mask predictor to accurately distinguish intact from corrupted regions without manual annotation. The restoration pipeline integrates a diffusion model enhanced with cross-attention mechanisms, hard mask fusion, and boundary-consistency post-processing to effectively reconstruct large-scale, irregularly distorted content. Experimental results demonstrate that the proposed method significantly outperforms existing techniques, efficiently producing highly realistic and visually coherent restorations.

Technology Category

Application Category

📝 Abstract

Bitstream-corrupted video recovery aims to restore realistic content degraded during video storage or transmission. Existing methods typically assume that predefined masks of corrupted regions are available, but manually annotating these masks is labor-intensive and impractical in real-world scenarios. To address this limitation, we introduce a new blind video recovery setting that removes the reliance on predefined masks. This setting presents two major challenges: accurately identifying corrupted regions and recovering content from extensive and irregular degradations. We propose a Metadata-Guided Diffusion Model (M-GDM) to tackle these challenges. Specifically, intrinsic video metadata are leveraged as corruption indicators through a dual-stream metadata encoder that separately embeds motion vectors and frame types before fusing them into a unified representation. This representation interacts with corrupted latent features via cross-attention at each diffusion step. To preserve intact regions, we design a prior-driven mask predictor that generates pseudo masks using both metadata and diffusion priors, enabling the separation and recombination of intact and recovered regions through hard masking. To mitigate boundary artifacts caused by imperfect masks, a post-refinement module enhances consistency between intact and recovered regions. Extensive experiments demonstrate the effectiveness of our method and its superiority in blind video recovery. Code is available at: https://github.com/Shuyun-Wang/M-GDM.

Problem

Research questions and friction points this paper is trying to address.

blind video recovery

bitstream corruption

metadata-guided

corrupted region identification

video restoration

Innovation

Methods, ideas, or system contributions that make the work stand out.

blind video recovery

metadata-guided diffusion

corruption detection