🤖 AI Summary
This work addresses the challenge of efficient video color and illumination editing under strict structural and temporal consistency constraints, a task hindered by existing methods’ reliance on costly paired training data. We propose the first unsupervised disentangled editing framework that leverages physical priors embedded in pretrained video generative models to adaptively fuse color-illumination attributes from a reference image through disentangled data perturbations. To mitigate discretization errors inherent in flow-based models, our approach introduces a residual velocity field together with a temporal structure consistency regularizer, enabling strong zero-shot generalization. Extensive experiments demonstrate significant improvements in visual quality and computational efficiency across diverse tasks—including relighting, recoloring, low-light enhancement, day-night conversion, and object-level editing—without requiring any paired training data.
📝 Abstract
Video chroma-lux editing, which aims to modify illumination and color while preserving structural and temporal fidelity, remains a significant challenge. Existing methods typically rely on expensive supervised training with synthetic paired data. This paper proposes VibeFlow, a novel self-supervised framework that unleashes the intrinsic physical understanding of pre-trained video generation models. Instead of learning color and light transitions from scratch, we introduce a disentangled data perturbation pipeline that enforces the model to adaptively recombine structure from source videos and color-illumination cues from reference images, enabling robust disentanglement in a self-supervised manner. Furthermore, to rectify discretization errors inherent in flow-based models, we introduce Residual Velocity Fields alongside a Structural Distortion Consistency Regularization, ensuring rigorous structural preservation and temporal coherence. Our framework eliminates the need for costly training resources and generalizes in a zero-shot manner to diverse applications, including video relighting, recoloring, low-light enhancement, day-night translation, and object-specific color editing. Extensive experiments demonstrate that VibeFlow achieves impressive visual quality with significantly reduced computational overhead. Our project is publicly available at https://lyf1212.github.io/VibeFlow-webpage.