Pose Priors from Language Models

📅 2024-05-06

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing 3D human pose estimation methods largely neglect natural language descriptions—a rich, readily available semantic prior—making it challenging to model physical contact (e.g., human–human interaction and self-contact) in markerless, label-free settings. This work introduces the first semantic-driven framework leveraging large language models (LLMs) and multimodal models (LMMs): it automatically parses natural language contact descriptions into differentiable contact semantics and integrates a contact-aware loss into the 3D pose optimization pipeline. Its core innovation lies in using LLMs as zero-shot contact priors, eliminating reliance on manual annotations or motion-capture data. Experiments demonstrate high-fidelity, physically plausible pose reconstruction in both two-person interaction and self-contact scenarios, significantly advancing unsupervised performance. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Language is often used to describe physical interaction, yet most 3D human pose estimation methods overlook this rich source of information. We bridge this gap by leveraging large multimodal models (LMMs) as priors for reconstructing contact poses, offering a scalable alternative to traditional methods that rely on human annotations or motion capture data. Our approach extracts contact-relevant descriptors from an LMM and translates them into tractable losses to constrain 3D human pose optimization. Despite its simplicity, our method produces compelling reconstructions for both two-person interactions and self-contact scenarios, accurately capturing the semantics of physical and social interactions. Our results demonstrate that LMMs can serve as powerful tools for contact prediction and pose estimation, offering an alternative to costly manual human annotations or motion capture data. Our code is publicly available at https://prosepose.github.io.

Problem

Research questions and friction points this paper is trying to address.

Bridging language and 3D pose estimation using LMMs

Reconstructing contact poses without human annotations

Translating LMM descriptors into pose optimization losses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LMMs for 3D human pose priors

Extracting contact descriptors from LMMs

Translating descriptors into optimization losses

🔎 Similar Papers

Pose Prior Learner: Unsupervised Categorical Prior Learning for Pose Estimation