Let Your Image Move with Your Motion! -- Implicit Multi-Object Multi-Motion Transfer

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing image-to-video motion transfer methods struggle to assign distinct motions to multiple objects within a single image, often resulting in motion entanglement. This work proposes FlexiMMT, the first framework to enable explicit multi-object, multi-motion transfer. By integrating object-specific mask constraints into diffusion models through a motion-decoupled mask attention mechanism and a differentiated mask propagation strategy, FlexiMMT effectively disentangles motions across objects. Furthermore, it leverages an attention-derived per-frame mask generation technique to support flexible pairing of arbitrary motions with individual objects. The method achieves precise, composable, and state-of-the-art performance in multi-object, multi-motion transfer tasks, significantly mitigating cross-object motion interference.

Technology Category

Application Category

📝 Abstract

Motion transfer has emerged as a promising direction for controllable video generation, yet existing methods largely focus on single-object scenarios and struggle when multiple objects require distinct motion patterns. In this work, we present FlexiMMT, the first implicit image-to-video (I2V) motion transfer framework that explicitly enables multi-object, multi-motion transfer. Given a static multi-object image and multiple reference videos, FlexiMMT independently extracts motion representations and accurately assigns them to different objects, supporting flexible recombination and arbitrary motion-to-object mappings. To address the core challenge of cross-object motion entanglement, we introduce a Motion Decoupled Mask Attention Mechanism that uses object-specific masks to constrain attention, ensuring that motion and text tokens only influence their designated regions. We further propose a Differentiated Mask Propagation Mechanism that derives object-specific masks directly from diffusion attention and progressively propagates them across frames efficiently. Extensive experiments demonstrate that FlexiMMT achieves precise, compositional, and state-of-the-art performance in I2V-based multi-object multi-motion transfer.

Problem

Research questions and friction points this paper is trying to address.

motion transfer

multi-object

multi-motion

image-to-video

controllable video generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-object motion transfer

implicit image-to-video generation

motion decoupled attention