Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing video generation models face two key bottlenecks in VFX production: (1) support only for single-effect LoRA fine-tuning, and (2) lack of spatial controllability, hindering concurrent multi-effect generation and precise localization. To address these, we propose Omni-VFX—the first unified framework enabling multi-effect video synthesis with pixel-level spatial control. Our method introduces a LoRA-Mixture-of-Experts (LoRA-MoE) architecture to mitigate cross-task interference, alongside Spatial-Aware Prompts (SAP) and Independent Information Flow (IIF) modules for decoupled multi-effect modeling and fine-grained spatial conditioning. Integrating image editing with start-end frame–driven video synthesis, we construct a high-quality Omni-VFX benchmark dataset. Experiments demonstrate that Omni-VFX enables text-guided, simultaneous generation of diverse visual effects with arbitrary region specification—significantly surpassing state-of-the-art methods in flexibility, controllability, and practical applicability.

Technology Category

Application Category

📝 Abstract

Visual effects (VFX) are essential visual enhancements fundamental to modern cinematic production. Although video generation models offer cost-efficient solutions for VFX production, current methods are constrained by per-effect LoRA training, which limits generation to single effects. This fundamental limitation impedes applications that require spatially controllable composite effects, i.e., the concurrent generation of multiple effects at designated locations. However, integrating diverse effects into a unified framework faces major challenges: interference from effect variations and spatial uncontrollability during multi-VFX joint training. To tackle these challenges, we propose Omni-Effects, a first unified framework capable of generating prompt-guided effects and spatially controllable composite effects. The core of our framework comprises two key innovations: (1) LoRA-based Mixture of Experts (LoRA-MoE), which employs a group of expert LoRAs, integrating diverse effects within a unified model while effectively mitigating cross-task interference. (2) Spatial-Aware Prompt (SAP) incorporates spatial mask information into the text token, enabling precise spatial control. Furthermore, we introduce an Independent-Information Flow (IIF) module integrated within the SAP, isolating the control signals corresponding to individual effects to prevent any unwanted blending. To facilitate this research, we construct a comprehensive VFX dataset Omni-VFX via a novel data collection pipeline combining image editing and First-Last Frame-to-Video (FLF2V) synthesis, and introduce a dedicated VFX evaluation framework for validating model performance. Extensive experiments demonstrate that Omni-Effects achieves precise spatial control and diverse effect generation, enabling users to specify both the category and location of desired effects.

Problem

Research questions and friction points this paper is trying to address.

Unified generation of multiple visual effects in one framework

Spatial control for composite effects at designated locations

Mitigating interference in multi-VFX joint training

Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA-MoE integrates diverse effects via expert LoRAs

SAP enables precise spatial control with mask info

IIF module isolates control signals for individual effects

🔎 Similar Papers

Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era