Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing 3D human motion generation methods suffer from strong task specificity and poor generalization across heterogeneous interaction scenarios. To address this, we propose the Unified Interaction Voxel (UIV) framework—the first approach enabling task-agnostic motion synthesis for human-human, human-object, and human-scene interactions. UIV maps diverse interaction entities (human bodies, objects, scenes) into a shared voxelized spatial representation and jointly models joint-level probabilistic predictions with fine-grained spatial dependencies to ensure relation-consistent composite interaction reasoning. By eliminating reliance on task-specific modules, UIV achieves state-of-the-art performance on three major interaction benchmarks and demonstrates significant zero-shot generalization to unseen entity combinations. This validates the effectiveness and robustness of the unified voxel-based representation paradigm for 3D interactive motion generation.

Technology Category

Application Category

📝 Abstract

We present Uni-Inter, a unified framework for human motion generation that supports a wide range of interaction scenarios: including human-human, human-object, and human-scene-within a single, task-agnostic architecture. In contrast to existing methods that rely on task-specific designs and exhibit limited generalization, Uni-Inter introduces the Unified Interactive Volume (UIV), a volumetric representation that encodes heterogeneous interactive entities into a shared spatial field. This enables consistent relational reasoning and compound interaction modeling. Motion generation is formulated as joint-wise probabilistic prediction over the UIV, allowing the model to capture fine-grained spatial dependencies and produce coherent, context-aware behaviors. Experiments across three representative interaction tasks demonstrate that Uni-Inter achieves competitive performance and generalizes well to novel combinations of entities. These results suggest that unified modeling of compound interactions offers a promising direction for scalable motion synthesis in complex environments.

Problem

Research questions and friction points this paper is trying to address.

Unifying human motion generation across diverse interaction scenarios in single framework

Encoding heterogeneous interactive entities into shared volumetric representation for consistency

Enabling coherent motion synthesis for novel entity combinations in complex environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for diverse human interaction scenarios

Introduces Unified Interactive Volume for shared spatial encoding

Formulates motion as joint-wise probabilistic prediction over UIV

🔎 Similar Papers

HUMOS: Human Motion Model Conditioned on Body Shape

2024-09-05European Conference on Computer VisionCitations: 4

ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions

2024-01-18arXiv.orgCitations: 8

TikTok

San Jose, California

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)