Human-in-Context: Unified Cross-Domain 3D Human Motion Modeling via In-Context Learning

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing cross-domain 3D human motion models rely on domain-specific components and multi-stage training, limiting generalization and scalability. Method: We propose Human-in-Context (HiC), the first unified framework enabling joint multimodal (pose/mesh), multi-task, and multi-dataset modeling within a single end-to-end pipeline. Contributions/Results: HiC introduces (1) a max-min similarity prompting sampling strategy to enhance contextual awareness; (2) a dual-branch context injection architecture that disentangles and fuses cross-domain semantic information; and (3) a context-aware unified network, eliminating domain-customized modules. Evaluated across multiple benchmarks, HiC consistently outperforms state-of-the-art methods in cross-domain generalization, large-scale data adaptation, and zero-shot transfer—achieving new SOTA performance. The framework establishes a highly flexible, scalable, and general-purpose paradigm for 3D human motion modeling.

Technology Category

Application Category

📝 Abstract

This paper aims to model 3D human motion across domains, where a single model is expected to handle multiple modalities, tasks, and datasets. Existing cross-domain models often rely on domain-specific components and multi-stage training, which limits their practicality and scalability. To overcome these challenges, we propose a new setting to train a unified cross-domain model through a single process, eliminating the need for domain-specific components and multi-stage training. We first introduce Pose-in-Context (PiC), which leverages in-context learning to create a pose-centric cross-domain model. While PiC generalizes across multiple pose-based tasks and datasets, it encounters difficulties with modality diversity, prompting strategy, and contextual dependency handling. We thus propose Human-in-Context (HiC), an extension of PiC that broadens generalization across modalities, tasks, and datasets. HiC combines pose and mesh representations within a unified framework, expands task coverage, and incorporates larger-scale datasets. Additionally, HiC introduces a max-min similarity prompt sampling strategy to enhance generalization across diverse domains and a network architecture with dual-branch context injection for improved handling of contextual dependencies. Extensive experimental results show that HiC performs better than PiC in terms of generalization, data scale, and performance across a wide range of domains. These results demonstrate the potential of HiC for building a unified cross-domain 3D human motion model with improved flexibility and scalability. The source codes and models are available at https://github.com/BradleyWang0416/Human-in-Context.

Problem

Research questions and friction points this paper is trying to address.

Model 3D human motion across multiple domains

Unify cross-domain training without domain-specific components

Enhance generalization across modalities, tasks, and datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified cross-domain model via single training process

Pose and mesh representations in one framework

Max-min similarity prompt sampling strategy

🔎 Similar Papers

HUMOS: Human Motion Model Conditioned on Body Shape

2024-09-05European Conference on Computer VisionCitations: 4

💼 Related Jobs

PhD GenAI Research Scientist Intern

Databricks

SF Bay Area Hourly Rate$54—$60 USD

San Francisco, CA, USA

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)