CADFS: A Big CAD Program Dataset and Framework for Computer-Aided Design with Large Language Models

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
Existing generative CAD systems are constrained by simplified representations and limited datasets, supporting only basic operations such as sketch-and-extrude and struggling to produce complex design histories. This work proposes CADFS, a novel framework that introduces FeatureScript program representations for the first time, leveraging a large-scale dataset of 450,000 real-world models encompassing 15 distinct modeling operations. The dataset is enriched with multimodal annotations and executable program reconstructions aligned with textual descriptions. By grounding generation in expressive, executable programs, CADFS significantly enhances the accuracy, diversity, and functional complexity of generated CAD models. It achieves state-of-the-art performance in both text-to-CAD generation and image-based reconstruction tasks, substantially improving the realism and engineering applicability of AI-generated designs.
📝 Abstract
We introduce CADFS, a data-centric framework that enables large vision-language models to generate complex CAD design histories. Existing generative CAD systems are restricted to sketch-extrude operations due to simplified representations and limited datasets. We address this by introducing a FeatureScript-based representation and constructing a dataset of 450k real-world CAD models spanning 15 modeling operations. We obtain the dataset via a new pipeline that reconstructs clean, executable FeatureScript programs and provides multimodal annotations. Fine-tuning a VLM on this representation yields state-of-the-art results in text-conditioned CAD generation and image-based reconstruction, producing more accurate, diverse, and feature-rich designs than prior frameworks. Ablations show that each individual component of our framework, i.e., the FeatureScript representation, the extended operation set, and representation-aligned textual descriptions, significantly improves performance. Our framework substantially broadens the complexity and realism achievable in generative CAD. The CADFS framework and the new dataset are available at https://voyleg.github.io/cadfs/.
Problem

Research questions and friction points this paper is trying to address.

generative CAD
design history
complexity
realism
limited datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

FeatureScript
generative CAD
vision-language models
CAD dataset
design history generation
🔎 Similar Papers