Human-in-Context: Unified Cross-Domain 3D Human Motion Modeling via In-Context Learning

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing cross-domain 3D human motion models rely on domain-specific components and multi-stage training, limiting generalization and scalability. Method: We propose Human-in-Context (HiC), the first unified framework enabling joint multimodal (pose/mesh), multi-task, and multi-dataset modeling within a single end-to-end pipeline. Contributions/Results: HiC introduces (1) a max-min similarity prompting sampling strategy to enhance contextual awareness; (2) a dual-branch context injection architecture that disentangles and fuses cross-domain semantic information; and (3) a context-aware unified network, eliminating domain-customized modules. Evaluated across multiple benchmarks, HiC consistently outperforms state-of-the-art methods in cross-domain generalization, large-scale data adaptation, and zero-shot transfer—achieving new SOTA performance. The framework establishes a highly flexible, scalable, and general-purpose paradigm for 3D human motion modeling.

Technology Category

Application Category

📝 Abstract
This paper aims to model 3D human motion across domains, where a single model is expected to handle multiple modalities, tasks, and datasets. Existing cross-domain models often rely on domain-specific components and multi-stage training, which limits their practicality and scalability. To overcome these challenges, we propose a new setting to train a unified cross-domain model through a single process, eliminating the need for domain-specific components and multi-stage training. We first introduce Pose-in-Context (PiC), which leverages in-context learning to create a pose-centric cross-domain model. While PiC generalizes across multiple pose-based tasks and datasets, it encounters difficulties with modality diversity, prompting strategy, and contextual dependency handling. We thus propose Human-in-Context (HiC), an extension of PiC that broadens generalization across modalities, tasks, and datasets. HiC combines pose and mesh representations within a unified framework, expands task coverage, and incorporates larger-scale datasets. Additionally, HiC introduces a max-min similarity prompt sampling strategy to enhance generalization across diverse domains and a network architecture with dual-branch context injection for improved handling of contextual dependencies. Extensive experimental results show that HiC performs better than PiC in terms of generalization, data scale, and performance across a wide range of domains. These results demonstrate the potential of HiC for building a unified cross-domain 3D human motion model with improved flexibility and scalability. The source codes and models are available at https://github.com/BradleyWang0416/Human-in-Context.
Problem

Research questions and friction points this paper is trying to address.

Model 3D human motion across multiple domains
Unify cross-domain training without domain-specific components
Enhance generalization across modalities, tasks, and datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified cross-domain model via single training process
Pose and mesh representations in one framework
Max-min similarity prompt sampling strategy
🔎 Similar Papers
2024-09-05European Conference on Computer VisionCitations: 4
M
Mengyuan Liu
National Key Laboratory of General Artificial Intelligence, Peking University, Shenzhen Graduate School
Xinshun Wang
Xinshun Wang
Peking University
human perception
Z
Zhongbin Fang
Tencent, China
Deheng Ye
Deheng Ye
Director of AI, Tencent
Applied machine learning
X
Xia Li
Department of Information Technology and Electrical Engineering, ETH Zurich
T
Tao Tang
National Key Laboratory of General Artificial Intelligence, Peking University, Shenzhen Graduate School
Songtao Wu
Songtao Wu
AI Researcher, Sony RDC
AI securityHuman computer interactionEdge AI
Xiangtai Li
Xiangtai Li
Research Scientist, Tiktok, SG; MMLab@NTU
Generative AIComputer Vision
Ming-Hsuan Yang
Ming-Hsuan Yang
University of California at Merced; Google DeepMind
Computer VisionMachine LearningArtificial Intelligence