Prompt-based Consistent Video Colorization

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video colorization methods suffer from temporal flickering or require extensive manual intervention, making it challenging to achieve both high fidelity and temporal consistency. This paper proposes the first language-conditioned diffusion model framework for automatic video colorization: semantic guidance is provided via generic text prompts and automatically generated segmentation masks to drive initial color generation; inter-frame color propagation is performed using RAFT optical flow, augmented by an inconsistency correction mechanism to suppress misalignment and flickering. To our knowledge, this is the first work to apply language-guided diffusion models to video colorization without manual color specification. Our method achieves state-of-the-art performance on DAVIS30 and VIDEVO20, outperforming prior approaches across PSNR, Colorfulness, and CDC metrics—demonstrating significant improvements in color accuracy and visual coherence.

Technology Category

Application Category

📝 Abstract
Existing video colorization methods struggle with temporal flickering or demand extensive manual input. We propose a novel approach automating high-fidelity video colorization using rich semantic guidance derived from language and segmentation. We employ a language-conditioned diffusion model to colorize grayscale frames. Guidance is provided via automatically generated object masks and textual prompts; our primary automatic method uses a generic prompt, achieving state-of-the-art results without specific color input. Temporal stability is achieved by warping color information from previous frames using optical flow (RAFT); a correction step detects and fixes inconsistencies introduced by warping. Evaluations on standard benchmarks (DAVIS30, VIDEVO20) show our method achieves state-of-the-art performance in colorization accuracy (PSNR) and visual realism (Colorfulness, CDC), demonstrating the efficacy of automated prompt-based guidance for consistent video colorization.
Problem

Research questions and friction points this paper is trying to address.

Automates high-fidelity video colorization using semantic guidance from language and segmentation.
Ensures temporal stability by warping color information and correcting inconsistencies with optical flow.
Achieves state-of-the-art accuracy and realism without manual color input on benchmarks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-conditioned diffusion model for colorization
Automatic object masks and generic prompts for guidance
Optical flow warping with correction for temporal stability
🔎 Similar Papers
No similar papers found.