CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

📅 2024-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing CAD modeling methods exhibit insufficient accuracy in 3D spatial localization—specifically, determining origin, orientation, and translation—when conditioned on text or image inputs. To address this, we propose an end-to-end spatially aware CAD synthesis framework. Our key contributions are: (1) a novel 3D spatial unfolding mechanism that maps continuous 3D positions and sketch-plane rotation angles into a 1D language space; (2) discrete encoding of 2D sketch coordinates to enhance robustness in spatial reasoning; and (3) a spatially aware multimodal large language model integrating cross-modal alignment with geometric sequence generation. Evaluated across multiple CAD synthesis benchmarks, our method achieves comprehensive superiority over state-of-the-art approaches, with significant improvements in quantitative metrics. The generated CAD models exhibit both high geometric fidelity and parameterized editability, enabling precise downstream design modifications.

Technology Category

Application Category

📝 Abstract
Computer-aided design (CAD) significantly enhances the efficiency, accuracy, and innovation of design processes by enabling precise 2D and 3D modeling, extensive analysis, and optimization. Existing methods for creating CAD models rely on latent vectors or point clouds, which are difficult to obtain and costly to store. Recent advances in Multimodal Large Language Models (MLLMs) have inspired researchers to use natural language instructions and images for CAD model construction. However, these models still struggle with inferring accurate 3D spatial location and orientation, leading to inaccuracies in determining the spatial 3D starting points and extrusion directions for constructing geometries. This work introduces CAD-GPT, a CAD synthesis method with spatial reasoning-enhanced MLLM that takes either a single image or a textual description as input. To achieve precise spatial inference, our approach introduces a 3D Modeling Spatial Mechanism. This method maps 3D spatial positions and 3D sketch plane rotation angles into a 1D linguistic feature space using a specialized spatial unfolding mechanism, while discretizing 2D sketch coordinates into an appropriate planar space to enable precise determination of spatial starting position, sketch orientation, and 2D sketch coordinate translations. Extensive experiments demonstrate that CAD-GPT consistently outperforms existing state-of-the-art methods in CAD model synthesis, both quantitatively and qualitatively.
Problem

Research questions and friction points this paper is trying to address.

CAD model design
spatial understanding
input interpretation
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Modeling
Spatial Mechanism
Design Accuracy
🔎 Similar Papers
No similar papers found.
S
Siyu Wang
School of Electronics Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China; Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
Cailian Chen
Cailian Chen
Shanghai Jiao Tong University
VANETSensor Networks and ApplicationsIndustrial Wireless NetworksMulti-agent SystemsDistributed Estimation and Detection
Xinyi Le
Xinyi Le
Professor, Automation, Shanghai Jiao Tong University
Computational IntelligenceNeural NetworksControl
Q
Qimin Xu
School of Electronics Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China; Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
L
Lei Xu
Institute of Cyber Science and Technology, Shanghai Jiao Tong University, Shanghai, China; Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
Y
Yanzhou Zhang
School of Electronics Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China; Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
J
Jie Yang
University of Minnesota Twin Cities, Saint Paul, MN, USA