DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenges of multi-view human mesh reconstruction, which is hindered by annotation noise in real-world data and domain shift from synthetic data. To overcome these limitations, the authors propose a diffusion-based zero-shot transfer framework trained exclusively on synthetic data. The method employs a multi-conditioning mechanism to generate view-consistent, pixel-aligned dense proxy representations of the human body. Additionally, it incorporates a vision-prompt-guided hand detail enhancement module and an uncertainty-aware test-time optimization strategy. Evaluated on five real-world benchmarks, the approach achieves state-of-the-art performance, demonstrating significant improvements in reconstruction accuracy—particularly under challenging conditions such as occlusions and partial viewpoints.

Technology Category

Application Category

📝 Abstract

Human mesh recovery from multi-view images faces a fundamental challenge: real-world datasets contain imperfect ground-truth annotations that bias the models'training, while synthetic data with precise supervision suffers from domain gap. In this paper, we propose DiffProxy, a novel framework that generates multi-view consistent human proxies for mesh recovery. Central to DiffProxy is leveraging the diffusion-based generative priors to bridge the synthetic training and real-world generalization. Its key innovations include: (1) a multi-conditional mechanism for generating multi-view consistent, pixel-aligned human proxies; (2) a hand refinement module that incorporates flexible visual prompts to enhance local details; and (3) an uncertainty-aware test-time scaling method that increases robustness to challenging cases during optimization. These designs ensure that the mesh recovery process effectively benefits from the precise synthetic ground truth and generative advantages of the diffusion-based pipeline. Trained entirely on synthetic data, DiffProxy achieves state-of-the-art performance across five real-world benchmarks, demonstrating strong zero-shot generalization particularly on challenging scenarios with occlusions and partial views. Project page: https://wrk226.github.io/DiffProxy.html

Problem

Research questions and friction points this paper is trying to address.

human mesh recovery

multi-view images

domain gap

imperfect annotations

synthetic data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based generative priors

Multi-view consistent proxies

Hand refinement with visual prompts