OmniFit: Multi-modal 3D Body Fitting via Scale-agnostic Dense Landmark Prediction

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing 3D human body fitting methods predominantly rely on single-modality inputs and require known scale, limiting their effectiveness on AI-generated or scale-distorted data. This work proposes a scale-agnostic, multi-modal fitting framework capable of unifying full scans, partial depth maps, and single RGB images within a single pipeline. The approach employs a conditional Transformer decoder to predict dense body keypoints and integrates a plug-and-play image adapter to regress SMPL-X parameters along with scene scale. Notably, this method is the first to surpass multi-view optimization baselines, achieving millimeter-level accuracy on the CAPE and 4D-DRESS benchmarks. Furthermore, it demonstrates substantial performance gains—ranging from 57.1% to 80.9%—over state-of-the-art approaches in everyday and loose-clothing scenarios.

Technology Category

Application Category

📝 Abstract

Fitting an underlying body model to 3D clothed human assets has been extensively studied, yet most approaches focus on either single-modal inputs such as point clouds or multi-view images alone, often requiring a known metric scale. This constraint is frequently impractical, especially for AI-generated assets where scale distortion is common. We propose OmniFit, a method that can seamlessly handle diverse multi-modal inputs, including full scans, partial depth observations, and image captures, while remaining scale-agnostic for both real and synthetic assets. Our key innovation is a simple yet effective conditional transformer decoder that directly maps surface points to dense body landmarks, which are then used for SMPL-X parameter fitting. In addition, an optional plug-and-play image adapter incorporates visual cues to compensate for missing geometric information. We further introduce a dedicated scale predictor that rescales subjects to canonical body proportions. OmniFit substantially outperforms state-of-the-art methods by 57.1 to 80.9 percent across daily and loose clothing scenarios. To the best of our knowledge, it is the first body fitting method to surpass multi-view optimization baselines and the first to achieve millimeter-level accuracy on the CAPE and 4D-DRESS benchmarks.

Problem

Research questions and friction points this paper is trying to address.

3D body fitting

multi-modal inputs

scale-agnostic

clothed human assets

metric scale

Innovation

Methods, ideas, or system contributions that make the work stand out.

scale-agnostic

dense landmark prediction

multi-modal 3D body fitting