🤖 AI Summary
This work addresses the challenges of identity confusion, anatomical implausibility, and motion incoherence commonly observed in existing personalized video generation methods when applied to two-person martial arts sparring scenarios. To tackle this, we introduce and implement the first personalized dual-character martial arts combat video generation task, thereby filling a critical gap in interactive human video synthesis for complex dyadic interactions. We construct a high-quality 3D sparring dataset using the Unity physics engine and propose a tailored generative model that integrates identity-preserving mechanisms with motion coordination constraints. Experimental results demonstrate that our approach produces high-fidelity videos featuring consistent character identities, temporally coherent movements, and realistic interaction dynamics, establishing a new paradigm for interactive content creation.
📝 Abstract
Amid the surge in generic text-to-video generation, the field of personalized human video generation has witnessed notable advancements, primarily concentrated on single-person scenarios. However, to our knowledge, the domain of two-person interactions, particularly in the context of martial arts combat, remains uncharted. We identify a significant gap: existing models for single-person dancing generation prove insufficient for capturing the subtleties and complexities of two engaged fighters, resulting in challenges such as identity confusion, anomalous limbs, and action mismatches. To address this, we introduce a pioneering new task, Personalized Martial Arts Combat Video Generation. Our approach, MagicFight, is specifically crafted to overcome these hurdles. Given this pioneering task, we face a lack of appropriate datasets. Thus, we generate a bespoke dataset using the game physics engine Unity, meticulously crafting a multitude of 3D characters, martial arts moves, and scenes designed to represent the diversity of combat. MagicFight refines and adapts existing models and strategies to generate high-fidelity two-person combat videos that maintain individual identities and ensure seamless, coherent action sequences, thereby laying the groundwork for future innovations in the realm of interactive video content creation.