🤖 AI Summary
This work addresses the limitations of existing text-to-CAD generation methods, which often neglect assembly hierarchies and geometric constraints, leading to an excessively large search space and error accumulation. To overcome these challenges, the authors propose a hierarchical, geometry-aware graph representation that models parts and subassemblies as nodes and encodes geometric constraints as edges. The framework first predicts the structural layout and associated constraints, then leverages this information to guide the generation of CAD modeling operations and code. A novel structure-aware progressive curriculum learning strategy is introduced, employing controlled editing to construct tiered tasks and synthesize boundary cases. Additionally, the study presents the first Text-to-CAD dataset annotated with exploded views and explicit geometric constraints, along with tailored evaluation metrics. Experiments demonstrate that the proposed method significantly outperforms existing approaches in geometric fidelity and constraint satisfaction, with validation on a newly curated dataset of 12K samples.
📝 Abstract
Text-to-CAD code generation is a long-horizon task that translates textual instructions into long sequences of interdependent operations. Existing methods typically decode text directly into executable code (e.g., bpy) without explicitly modeling assembly hierarchy or geometric constraints, which enlarges the search space, accumulates local errors, and often causes cascading failures in complex assemblies. To address this issue, we propose a hierarchical and geometry-aware graph as an intermediate representation. The graph models multi-level parts and components as nodes and encodes explicit geometric constraints as edges. Instead of mapping text directly to code, our framework first predicts structure and constraints, then conditions action sequencing and code generation, thereby improving geometric fidelity and constraint satisfaction. We further introduce a structure-aware progressive curriculum learning strategy that constructs graded tasks through controlled structural edits, explores the model's capability boundary, and synthesizes boundary examples for iterative training. In addition, we build a 12K dataset with instructions, decomposition graphs, action sequences, and bpy code, together with graph- and constraint-oriented evaluation metrics. Extensive experiments show that our method consistently outperforms existing approaches in both geometric fidelity and accurate satisfaction of geometric constraints.