MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior

📅 2024-09-16

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Traditional image synthesis methods rely on manual layout and produce only static images, failing to model realistic motion. To address this, we propose the first training-free, motion-aware image synthesis framework that seamlessly integrates target objects into novel scenes and generates dynamically coherent composites without optimization or human intervention. Our approach innovatively combines a multimodal chain-of-thought–driven LVLM for joint spatial-motion planning with MotionPaint—a video diffusion prior distillation technique—enabling explicit modeling and injection of authentic motion semantics into image synthesis for the first time. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches across quantitative metrics (FID, LPIPS) and human evaluations, producing results with natural motion continuity and physically plausible scene-object interactions.

Technology Category

Application Category

📝 Abstract

This work presents MotionCom, a training-free motion-aware diffusion based image composition, enabling automatic and seamless integration of target objects into new scenes with dynamically coherent results without finetuning or optimization. Traditional approaches in this area suffer from two significant limitations: they require manual planning for object placement and often generate static compositions lacking motion realism. MotionCom addresses these issues by utilizing a Large Vision Language Model (LVLM) for intelligent planning, and a Video Diffusion prior for motion-infused image synthesis, streamlining the composition process. Our multi-modal Chain-of-Thought (CoT) prompting with LVLM automates the strategic placement planning of foreground objects, considering their potential motion and interaction within the scenes. Complementing this, we propose a novel method MotionPaint to distill motion-aware information from pretrained video diffusion models in the generation phase, ensuring that these objects are not only seamlessly integrated but also endowed with realistic motion. Extensive quantitative and qualitative results highlight MotionCom's superiority, showcasing its efficiency in streamlining the planning process and its capability to produce compositions that authentically depict motion and interaction.

Problem

Research questions and friction points this paper is trying to address.

Dynamic motion-aware image composition with realistic effects

Overcoming manual planning and static output limitations

Unifying simulation and diffusion methods for coherent results

Innovation

Methods, ideas, or system contributions that make the work stand out.

LVLM-based intelligent planner for scenario analysis

Enhanced MPM simulation for physical interactions

Pretrained video diffusion for motion effects

🔎 Similar Papers

No similar papers found.