OmniFit: Multi-modal 3D Body Fitting via Scale-agnostic Dense Landmark Prediction

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
Existing 3D human body fitting methods predominantly rely on single-modality inputs and require known scale, limiting their effectiveness on AI-generated or scale-distorted data. This work proposes a scale-agnostic, multi-modal fitting framework capable of unifying full scans, partial depth maps, and single RGB images within a single pipeline. The approach employs a conditional Transformer decoder to predict dense body keypoints and integrates a plug-and-play image adapter to regress SMPL-X parameters along with scene scale. Notably, this method is the first to surpass multi-view optimization baselines, achieving millimeter-level accuracy on the CAPE and 4D-DRESS benchmarks. Furthermore, it demonstrates substantial performance gains—ranging from 57.1% to 80.9%—over state-of-the-art approaches in everyday and loose-clothing scenarios.

Technology Category

Application Category

📝 Abstract
Fitting an underlying body model to 3D clothed human assets has been extensively studied, yet most approaches focus on either single-modal inputs such as point clouds or multi-view images alone, often requiring a known metric scale. This constraint is frequently impractical, especially for AI-generated assets where scale distortion is common. We propose OmniFit, a method that can seamlessly handle diverse multi-modal inputs, including full scans, partial depth observations, and image captures, while remaining scale-agnostic for both real and synthetic assets. Our key innovation is a simple yet effective conditional transformer decoder that directly maps surface points to dense body landmarks, which are then used for SMPL-X parameter fitting. In addition, an optional plug-and-play image adapter incorporates visual cues to compensate for missing geometric information. We further introduce a dedicated scale predictor that rescales subjects to canonical body proportions. OmniFit substantially outperforms state-of-the-art methods by 57.1 to 80.9 percent across daily and loose clothing scenarios. To the best of our knowledge, it is the first body fitting method to surpass multi-view optimization baselines and the first to achieve millimeter-level accuracy on the CAPE and 4D-DRESS benchmarks.
Problem

Research questions and friction points this paper is trying to address.

3D body fitting
multi-modal inputs
scale-agnostic
clothed human assets
metric scale
Innovation

Methods, ideas, or system contributions that make the work stand out.

scale-agnostic
dense landmark prediction
multi-modal 3D body fitting
conditional transformer decoder
SMPL-X