🤖 AI Summary
Existing two-person motion generation methods suffer from modeling deficiencies: naive concatenation of individual skeletons ignores interaction causality, while separate modeling fails to capture dynamic role evolution—leading to suboptimal performance and parameter redundancy. This paper proposes a two-stage framework comprising temporal modeling and interaction fusion, introducing three novel mechanisms: causal interaction injection, role-evolution scanning, and local pattern enhancement. Our method builds an efficient Transformer architecture grounded in causal sequence modeling, dynamic role-aware attention, and lightweight spatiotemporal convolutions. Evaluated on InterHuman and InterX, it significantly outperforms state-of-the-art approaches: improving motion plausibility by 12.6%, reducing model parameters by 37%, and generating motions that are more temporally coherent, physically plausible, and socially semantically consistent.
📝 Abstract
Human-human motion generation is essential for understanding humans as social beings. Current methods fall into two main categories: single-person-based methods and separate modeling-based methods. To delve into this field, we abstract the overall generation process into a general framework MetaMotion, which consists of two phases: temporal modeling and interaction mixing. For temporal modeling, the single-person-based methods concatenate two people into a single one directly, while the separate modeling-based methods skip the modeling of interaction sequences. The inadequate modeling described above resulted in sub-optimal performance and redundant model parameters. In this paper, we introduce TIMotion (Temporal and Interactive Modeling), an efficient and effective framework for human-human motion generation. Specifically, we first propose Causal Interactive Injection to model two separate sequences as a causal sequence leveraging the temporal and causal properties. Then we present Role-Evolving Scanning to adjust to the change in the active and passive roles throughout the interaction. Finally, to generate smoother and more rational motion, we design Localized Pattern Amplification to capture short-term motion patterns. Extensive experiments on InterHuman and InterX demonstrate that our method achieves superior performance. The project code will be released upon acceptance. Project page: https://aigc-explorer.github.io/TIMotion-page/