Tuning-free Visual Effect Transfer across Videos

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to effectively transfer complex dynamic visual effects—such as dynamic lighting and character deformation—to target videos or images using only text prompts or keyframes. This work proposes RefVFX, the first end-to-end, fine-tuning-free framework for reference-based video effect transfer that preserves the motion and structural integrity of the target content while enabling cross-category generalization. Key contributions include the construction of the first large-scale video visual effects triplet dataset, a scalable automated synthesis pipeline, and the integration of LoRA adapters with procedural temporal effect modeling. Experiments demonstrate that RefVFX significantly outperforms prompt-driven baselines in visual consistency, temporal coherence, and generalization capability, achieving state-of-the-art performance in both human preference studies and quantitative metrics.

Technology Category

Application Category

📝 Abstract
We present RefVFX, a new framework that transfers complex temporal effects from a reference video onto a target video or image in a feed-forward manner. While existing methods excel at prompt-based or keyframe-conditioned editing, they struggle with dynamic temporal effects such as dynamic lighting changes or character transformations, which are difficult to describe via text or static conditions. Transferring a video effect is challenging, as the model must integrate the new temporal dynamics with the input video's existing motion and appearance. % To address this, we introduce a large-scale dataset of triplets, where each triplet consists of a reference effect video, an input image or video, and a corresponding output video depicting the transferred effect. Creating this data is non-trivial, especially the video-to-video effect triplets, which do not exist naturally. To generate these, we propose a scalable automated pipeline that creates high-quality paired videos designed to preserve the input's motion and structure while transforming it based on some fixed, repeatable effect. We then augment this data with image-to-video effects derived from LoRA adapters and code-based temporal effects generated through programmatic composition. Building on our new dataset, we train our reference-conditioned model using recent text-to-video backbones. Experimental results demonstrate that RefVFX produces visually consistent and temporally coherent edits, generalizes across unseen effect categories, and outperforms prompt-only baselines in both quantitative metrics and human preference. See our website at https://tuningfreevisualeffects-maker.github.io/Tuning-free-Visual-Effect-Transfer-across-Videos-Project-Page/
Problem

Research questions and friction points this paper is trying to address.

visual effect transfer
temporal effects
video editing
dynamic lighting
character transformation
Innovation

Methods, ideas, or system contributions that make the work stand out.

visual effect transfer
temporal dynamics
reference-conditioned generation
tuning-free video editing
automated dataset synthesis
🔎 Similar Papers
No similar papers found.