FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control

📅 2026-02-13

📈 Citations: 0

✨ Influential: 0

📄 PDF

career value

202K/year

Technology Category

Application Category

📝 Abstract

Effective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue that a fundamental disentanglement of"appearance"and"motion"provides a more robust and scalable pathway. We propose FlexAM, a unified framework built upon a novel 3D control signal. This signal represents video dynamics as a point cloud, introducing three key enhancements: multi-frequency positional encoding to distinguish fine-grained motion, depth-aware positional encoding, and a flexible control signal for balancing precision and generative quality. This representation allows FlexAM to effectively disentangle appearance and motion, enabling a wide range of tasks including I2V/V2V editing, camera control, and spatial object editing. Extensive experiments demonstrate that FlexAM achieves superior performance across all evaluated tasks.

Problem

Research questions and friction points this paper is trying to address.

video generation

appearance-motion disentanglement

generative control

versatile video editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

appearance-motion disentanglement

3D control signal

point cloud representation

multi-frequency positional encoding