CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of generating parametric CAD models directly from unconstrained real-world images, aiming to lower the barrier for digital twin construction and mitigate the scarcity of real CAD data. We propose a synthetic-data-driven paradigm: training exclusively on texture-free synthetic CAD data, leveraging a geometric feature encoder to achieve cross-domain generalization; and—novelly introducing Direct Preference Optimization (DPO) into CAD sequence generation, integrated with automated code validation for unsupervised geometric constraint learning. We introduce the first multi-view real CAD image–command pair dataset. On this benchmark, our method significantly outperforms existing approaches, demonstrating robustness to variations in illumination, viewpoint, and occlusion, and successfully generalizing to unseen object categories.

Technology Category

Application Category

📝 Abstract
Creating CAD digital twins from the physical world is crucial for manufacturing, design, and simulation. However, current methods typically rely on costly 3D scanning with labor-intensive post-processing. To provide a user-friendly design process, we explore the problem of reverse engineering from unconstrained real-world CAD images that can be easily captured by users of all experiences. However, the scarcity of real-world CAD data poses challenges in directly training such models. To tackle these challenges, we propose CADCrafter, an image-to-parametric CAD model generation framework that trains solely on synthetic textureless CAD data while testing on real-world images. To bridge the significant representation disparity between images and parametric CAD models, we introduce a geometry encoder to accurately capture diverse geometric features. Moreover, the texture-invariant properties of the geometric features can also facilitate the generalization to real-world scenarios. Since compiling CAD parameter sequences into explicit CAD models is a non-differentiable process, the network training inherently lacks explicit geometric supervision. To impose geometric validity constraints, we employ direct preference optimization (DPO) to fine-tune our model with the automatic code checker feedback on CAD sequence quality. Furthermore, we collected a real-world dataset, comprised of multi-view images and corresponding CAD command sequence pairs, to evaluate our method. Experimental results demonstrate that our approach can robustly handle real unconstrained CAD images, and even generalize to unseen general objects.
Problem

Research questions and friction points this paper is trying to address.

Generating CAD models from unconstrained real-world images
Bridging representation gap between images and parametric CAD
Ensuring geometric validity without explicit supervision in training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates CAD models from unconstrained images
Uses synthetic data for training real-world images
Employs geometry encoder and DPO optimization
🔎 Similar Papers
No similar papers found.
C
Cheng Chen
Nanyang Technological University, Institute for Infocomm Research, A*STAR, Singapore
J
Jiacheng Wei
Nanyang Technological University
Tianrun Chen
Tianrun Chen
Zhejiang University
Computer Vision3D ReconstructionComputational ImagingLarge Vision-Language Model
C
Chi Zhang
Weatlake University
X
Xiaofeng Yang
Nanyang Technological University
S
Shangzhan Zhang
Zhejiang University
Bingchen Yang
Bingchen Yang
University of Chinese Academy of Sciences (UCAS)
C
Chuan-Sheng Foo
Institute for Infocomm Research, A*STAR, Singapore, Centre for Frontier AI Research, A*STAR, Singapore
Guosheng Lin
Guosheng Lin
Nanyang Technological University
Computer VisionMachine Learning
Qixing Huang
Qixing Huang
Associate Professor of Computer Science, UT Austin
Computer GraphicsComputer VisionMachine LearningOptimizationBig Data
Fayao Liu
Fayao Liu
Institute for Infocomm Research, A*STAR
Machine LearningComputer Vision