Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the underexplored yet critical cinematic requirement of object-controllable motion—specifically Frame In/Out—in video generation. Methodologically, we introduce the first synthetic dataset and dedicated evaluation protocol tailored for Frame In/Out tasks. We further propose an identity-aware motion-controllable video diffusion Transformer that integrates motion trajectory guidance, identity feature disentanglement, and re-injection mechanisms to enable natural, path-guided object entrance and exit. Extensive experiments demonstrate that our approach achieves state-of-the-art performance across three key metrics: object controllability, identity consistency, and motion naturalness—significantly outperforming existing video generation baselines. To our knowledge, this is the first systematic solution for cinematic Frame In/Out control in diffusion-based video generation.

Technology Category

Application Category

📝 Abstract

Controllability, temporal coherence, and detail synthesis remain the most critical challenges in video generation. In this paper, we focus on a commonly used yet underexplored cinematic technique known as Frame In and Frame Out. Specifically, starting from image-to-video generation, users can control the objects in the image to naturally leave the scene or provide breaking new identity references to enter the scene, guided by user-specified motion trajectory. To support this task, we introduce a new dataset curated semi-automatically, a comprehensive evaluation protocol targeting this setting, and an efficient identity-preserving motion-controllable video Diffusion Transformer architecture. Our evaluation shows that our proposed approach significantly outperforms existing baselines.

Problem

Research questions and friction points this paper is trying to address.

Control objects to naturally leave or enter scenes

Achieve temporal coherence and detail synthesis

Generate videos from images with motion control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identity-preserving motion-controllable video Diffusion Transformer

Semi-automatically curated new dataset

User-specified motion trajectory guidance

🔎 Similar Papers

No similar papers found.