L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild

📅 2025-01-02

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Existing methods suffer from low accuracy in monocular 2D-to-3D pose lifting for animals in the wild, difficulty in cross-species motion retargeting, and severe scarcity of real-world 3D pose data due to the absence of ground-truth depth annotations and controllable animal behaviors. Method: We propose an end-to-end 3D character-driven framework comprising: (1) a lightweight attention-based MLP network for robust 2D-to-3D pose lifting; (2) a novel cross-species motion synthesis and retargeting lookup table that decouples anatomical constraints, enabling arbitrary 3D character adaptation; and (3) a synthetic data generation pipeline featuring rigged avatars and diverse motion priors. Results: Experiments on natural-scene monocular videos—without depth supervision—demonstrate significant improvements in 3D pose estimation fidelity and motion retargeting accuracy. The framework exhibits strong generalization and, for the first time, enables high-fidelity animation generation of arbitrary 3D animal characters driven by a single camera.

Technology Category

Application Category

📝 Abstract

While 2D pose estimation has advanced our ability to interpret body movements in animals and primates, it is limited by the lack of depth information, constraining its application range. 3D pose estimation provides a more comprehensive solution by incorporating spatial depth, yet creating extensive 3D pose datasets for animals is challenging due to their dynamic and unpredictable behaviours in natural settings. To address this, we propose a hybrid approach that utilizes rigged avatars and the pipeline to generate synthetic datasets to acquire the necessary 3D annotations for training. Our method introduces a simple attention-based MLP network for converting 2D poses to 3D, designed to be independent of the input image to ensure scalability for poses in natural environments. Additionally, we identify that existing anatomical keypoint detectors are insufficient for accurate pose retargeting onto arbitrary avatars. To overcome this, we present a lookup table based on a deep pose estimation method using a synthetic collection of diverse actions rigged avatars perform. Our experiments demonstrate the effectiveness and efficiency of this lookup table-based retargeting approach. Overall, we propose a comprehensive framework with systematically synthesized datasets for lifting poses from 2D to 3D and then utilize this to re-target motion from wild settings onto arbitrary avatars.

Problem

Research questions and friction points this paper is trying to address.

2D to 3D Pose Conversion

Key Point Recognition

Virtual Character Animation

Innovation

Methods, ideas, or system contributions that make the work stand out.

2D to 3D Pose Conversion

Virtual Avatar Animation

Single Camera Outdoor Pose Estimation

🔎 Similar Papers

Markerless Multi-view 3D Human Pose Estimation: a survey