FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control

πŸ“… 2026-02-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

Technology Category

Application Category

πŸ“ Abstract
Effective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue that a fundamental disentanglement of"appearance"and"motion"provides a more robust and scalable pathway. We propose FlexAM, a unified framework built upon a novel 3D control signal. This signal represents video dynamics as a point cloud, introducing three key enhancements: multi-frequency positional encoding to distinguish fine-grained motion, depth-aware positional encoding, and a flexible control signal for balancing precision and generative quality. This representation allows FlexAM to effectively disentangle appearance and motion, enabling a wide range of tasks including I2V/V2V editing, camera control, and spatial object editing. Extensive experiments demonstrate that FlexAM achieves superior performance across all evaluated tasks.
Problem

Research questions and friction points this paper is trying to address.

video generation
appearance-motion disentanglement
generative control
versatile video editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

appearance-motion disentanglement
3D control signal
point cloud representation
multi-frequency positional encoding
depth-aware encoding
πŸ”Ž Similar Papers