Zero-shot Inexact CAD Model Alignment from a Single Image

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing supervised methods for single-image zero-shot 3D model alignment suffer from inaccurate pose estimation due to reliance on scarce category- and pose-level annotations. Method: We propose the first weakly supervised framework, constructing a foundation-model-driven joint geometric-semantic feature space; introducing multi-view consistency constraints and a self-supervised triplet loss to mitigate symmetry ambiguity; and designing a texture-invariant normalized coordinate-based dense alignment mechanism for precise 9-DoF pose estimation. Results: On ScanNet25k, our method outperforms the prior state-of-the-art weakly supervised approach by 4.3% and—remarkably—for the first time surpasses the supervised method ROCA by +2.7% without any pose annotations. On our newly introduced cross-domain benchmark SUN2CAD, it achieves state-of-the-art performance across 20 unseen CAD categories, demonstrating strong zero-shot generalization capability.

Technology Category

Application Category

📝 Abstract

One practical approach to infer 3D scene structure from a single image is to retrieve a closely matching 3D model from a database and align it with the object in the image. Existing methods rely on supervised training with images and pose annotations, which limits them to a narrow set of object categories. To address this, we propose a weakly supervised 9-DoF alignment method for inexact 3D models that requires no pose annotations and generalizes to unseen categories. Our approach derives a novel feature space based on foundation features that ensure multi-view consistency and overcome symmetry ambiguities inherent in foundation features using a self-supervised triplet loss. Additionally, we introduce a texture-invariant pose refinement technique that performs dense alignment in normalized object coordinates, estimated through the enhanced feature space. We conduct extensive evaluations on the real-world ScanNet25k dataset, where our method outperforms SOTA weakly supervised baselines by +4.3% mean alignment accuracy and is the only weakly supervised approach to surpass the supervised ROCA by +2.7%. To assess generalization, we introduce SUN2CAD, a real-world test set with 20 novel object categories, where our method achieves SOTA results without prior training on them.

Problem

Research questions and friction points this paper is trying to address.

Align inexact 3D models from single images without pose annotations

Generalize to unseen object categories using weakly supervised method

Overcome symmetry ambiguities with self-supervised feature space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly supervised 9-DoF alignment without pose annotations

Novel feature space with multi-view consistency

Texture-invariant pose refinement in normalized coordinates

🔎 Similar Papers

Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization