🤖 AI Summary
This work addresses the challenge of high-fidelity visual effects (VFX) generation, which typically demands substantial data and computational resources due to the strong coupling between spatial textures and temporal dynamics, thereby hindering deployment in resource-constrained settings. To overcome this, the paper introduces frequency-domain decoupling into VFX synthesis for the first time, proposing the Freq-MoE architecture that decomposes effects into high-frequency components (spatial details) and low-frequency components (global motion). The method employs a two-stage training strategy: first, a frequency-aware mixture-of-experts model learns generalizable VFX priors; second, test-time training combined with frequency-domain constraint losses enables rapid adaptation to novel effects. Remarkably, the approach generates structurally coherent and visually realistic professional-grade VFX with only ~100 optimization steps on a single GPU, significantly reducing reliance on large datasets and extensive compute.
📝 Abstract
Generating high-fidelity visual effects (VFX) typically demands massive datasets and prohibitive computational power due to the intricate coupling of spatial textures and temporal dynamics. In this paper, we introduce EasyVFX, a resource-efficient framework that achieves realistic VFX synthesis under stringent constraints. Our core philosophy lies in frequency-domain decomposition: we observe that the complexity of VFX can be significantly mitigated by decoupling high-frequency components, which represent intricate spatial appearances, from low-frequency components that encapsulate global motion dynamics. This spectral disentanglement transforms a high-dimensional learning problem into manageable sub-tasks, thereby lowering the optimization barrier and reducing data dependency. Building upon this insight, we propose a two-stage training paradigm. First, we design a Frequency-aware Mixture-of-Experts (Freq-MoE) architecture. By utilizing a soft routing mechanism, our model assigns specialized experts to distinct spectral bands, enabling them to cultivate robust priors for appearance and motion dynamics. This specialization allows the model to acquire foundational VFX knowledge with fewer GPU resources. Second, we introduce a Test-Time Training strategy powered by a novel Frequency-constraint Loss. This allows the pre-trained model to swiftly adapt to specific, unseen effects through localized optimizations, requiring only about 100 steps on a single GPU. Experimental results demonstrate that EasyVFX produces structurally consistent and visually stunning effects, proving that frequency-aware learning is a key catalyst for democratizing professional-grade VFX.