Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts

๐Ÿ“… 2025-11-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing 3D human motion generation methods suffer from strong task specificity and poor generalization across heterogeneous interaction scenarios. To address this, we propose the Unified Interaction Voxel (UIV) frameworkโ€”the first approach enabling task-agnostic motion synthesis for human-human, human-object, and human-scene interactions. UIV maps diverse interaction entities (human bodies, objects, scenes) into a shared voxelized spatial representation and jointly models joint-level probabilistic predictions with fine-grained spatial dependencies to ensure relation-consistent composite interaction reasoning. By eliminating reliance on task-specific modules, UIV achieves state-of-the-art performance on three major interaction benchmarks and demonstrates significant zero-shot generalization to unseen entity combinations. This validates the effectiveness and robustness of the unified voxel-based representation paradigm for 3D interactive motion generation.

Technology Category

Application Category

๐Ÿ“ Abstract
We present Uni-Inter, a unified framework for human motion generation that supports a wide range of interaction scenarios: including human-human, human-object, and human-scene-within a single, task-agnostic architecture. In contrast to existing methods that rely on task-specific designs and exhibit limited generalization, Uni-Inter introduces the Unified Interactive Volume (UIV), a volumetric representation that encodes heterogeneous interactive entities into a shared spatial field. This enables consistent relational reasoning and compound interaction modeling. Motion generation is formulated as joint-wise probabilistic prediction over the UIV, allowing the model to capture fine-grained spatial dependencies and produce coherent, context-aware behaviors. Experiments across three representative interaction tasks demonstrate that Uni-Inter achieves competitive performance and generalizes well to novel combinations of entities. These results suggest that unified modeling of compound interactions offers a promising direction for scalable motion synthesis in complex environments.
Problem

Research questions and friction points this paper is trying to address.

Unifying human motion generation across diverse interaction scenarios in single framework
Encoding heterogeneous interactive entities into shared volumetric representation for consistency
Enabling coherent motion synthesis for novel entity combinations in complex environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for diverse human interaction scenarios
Introduces Unified Interactive Volume for shared spatial encoding
Formulates motion as joint-wise probabilistic prediction over UIV
๐Ÿ”Ž Similar Papers
No similar papers found.
S
Sheng Liu
Nanjing University, Nanjing, China
Yuanzhi Liang
Yuanzhi Liang
UTS
Jiepeng Wang
Jiepeng Wang
The University of Hong Kong
3D VisionAIGCRobotics
Sidan Du
Sidan Du
Nanjing University
Image Processing and ControlMachine Learning
C
Chi Zhang
Institute of Artificial Intelligence, China Telecom (TeleAI), Shanghai, China
X
Xuelong Li
Institute of Artificial Intelligence, China Telecom (TeleAI), Shanghai, China