FMPose3D: monocular 3D pose estimation via flow matching

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the ill-posed nature of monocular 3D pose estimation, which suffers from depth ambiguity and occlusion, by generating diverse yet plausible 3D pose hypotheses. It introduces flow matching to this task for the first time, leveraging a conditional velocity field defined by an ordinary differential equation to efficiently sample multimodal 3D poses from a Gaussian prior. To produce an optimal prediction, the authors propose a Reprojection Posterior Expectation Aggregation (RPEA) module that approximates the posterior expectation. The method achieves state-of-the-art performance across both human benchmarks (Human3.6M and MPI-INF-3DHP) and animal datasets (Animal3D and CtrlAni3D), demonstrating a strong balance between generation diversity and prediction accuracy, and confirming its generality and efficiency.

Technology Category

Application Category

📝 Abstract
Monocular 3D pose estimation is fundamentally ill-posed due to depth ambiguity and occlusions, thereby motivating probabilistic methods that generate multiple plausible 3D pose hypotheses. In particular, diffusion-based models have recently demonstrated strong performance, but their iterative denoising process typically requires many timesteps for each prediction, making inference computationally expensive. In contrast, we leverage Flow Matching (FM) to learn a velocity field defined by an Ordinary Differential Equation (ODE), enabling efficient generation of 3D pose samples with only a few integration steps. We propose a novel generative pose estimation framework, FMPose3D, that formulates 3D pose estimation as a conditional distribution transport problem. It continuously transports samples from a standard Gaussian prior to the distribution of plausible 3D poses conditioned only on 2D inputs. Although ODE trajectories are deterministic, FMPose3D naturally generates various pose hypotheses by sampling different noise seeds. To obtain a single accurate prediction from those hypotheses, we further introduce a Reprojection-based Posterior Expectation Aggregation (RPEA) module, which approximates the Bayesian posterior expectation over 3D hypotheses. FMPose3D surpasses existing methods on the widely used human pose estimation benchmarks Human3.6M and MPI-INF-3DHP, and further achieves state-of-the-art performance on the 3D animal pose datasets Animal3D and CtrlAni3D, demonstrating strong performance across both 3D pose domains. The code is available at https://github.com/AdaptiveMotorControlLab/FMPose3D.
Problem

Research questions and friction points this paper is trying to address.

monocular 3D pose estimation
depth ambiguity
occlusions
ill-posed problem
pose hypotheses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching
3D Pose Estimation
Conditional Distribution Transport
ODE-based Generation
Reprojection-based Aggregation
🔎 Similar Papers
No similar papers found.
T
Ti Wang
École Polytechnique Fédérale de Lausanne (EPFL)
X
Xiaohang Yu
École Polytechnique Fédérale de Lausanne (EPFL)
Mackenzie W. Mathis
Mackenzie W. Mathis
Swiss Federal Institute of Technology in Lausanne (EPFL)
Systems NeuroscienceSensorimotor ControlComputer VisionMachine Learning