Let Your Image Move with Your Motion! -- Implicit Multi-Object Multi-Motion Transfer

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image-to-video motion transfer methods struggle to assign distinct motions to multiple objects within a single image, often resulting in motion entanglement. This work proposes FlexiMMT, the first framework to enable explicit multi-object, multi-motion transfer. By integrating object-specific mask constraints into diffusion models through a motion-decoupled mask attention mechanism and a differentiated mask propagation strategy, FlexiMMT effectively disentangles motions across objects. Furthermore, it leverages an attention-derived per-frame mask generation technique to support flexible pairing of arbitrary motions with individual objects. The method achieves precise, composable, and state-of-the-art performance in multi-object, multi-motion transfer tasks, significantly mitigating cross-object motion interference.

Technology Category

Application Category

📝 Abstract
Motion transfer has emerged as a promising direction for controllable video generation, yet existing methods largely focus on single-object scenarios and struggle when multiple objects require distinct motion patterns. In this work, we present FlexiMMT, the first implicit image-to-video (I2V) motion transfer framework that explicitly enables multi-object, multi-motion transfer. Given a static multi-object image and multiple reference videos, FlexiMMT independently extracts motion representations and accurately assigns them to different objects, supporting flexible recombination and arbitrary motion-to-object mappings. To address the core challenge of cross-object motion entanglement, we introduce a Motion Decoupled Mask Attention Mechanism that uses object-specific masks to constrain attention, ensuring that motion and text tokens only influence their designated regions. We further propose a Differentiated Mask Propagation Mechanism that derives object-specific masks directly from diffusion attention and progressively propagates them across frames efficiently. Extensive experiments demonstrate that FlexiMMT achieves precise, compositional, and state-of-the-art performance in I2V-based multi-object multi-motion transfer.
Problem

Research questions and friction points this paper is trying to address.

motion transfer
multi-object
multi-motion
image-to-video
controllable video generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-object motion transfer
implicit image-to-video generation
motion decoupled attention
mask propagation
compositional video synthesis
🔎 Similar Papers
No similar papers found.