CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers?

📅 2024-12-18
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current CAD systems lack general-purpose intelligent agents capable of understanding and executing design tasks through natural multimodal interaction. Method: This paper proposes a multimodal CAD assistant architecture centered on a vision-language large model (VLLM) for high-level planning, tightly integrated with domain-specific toolchains—including the FreeCAD Python API—to support joint natural language and image inputs. It employs tool-augmented reasoning, secure Python sandbox execution, and geometric state awareness to generate, iteratively execute, and dynamically verify CAD commands. Contribution/Results: We introduce the first VLLM-CAD co-design paradigm enabling adaptive, closed-loop editing across diverse tasks. Evaluated on multiple CAD benchmarks, our system successfully performs sketch generation, parametric modeling, and assembly reasoning—demonstrating end-to-end capability in complex, real-world CAD workflows while substantially reducing manual intervention.

Technology Category

Application Category

📝 Abstract
We propose CAD-Assistant, a general-purpose CAD agent for AI-assisted design. Our approach is based on a powerful Vision and Large Language Model (VLLM) as a planner and a tool-augmentation paradigm using CAD-specific modules. CAD-Assistant addresses multimodal user queries by generating actions that are iteratively executed on a Python interpreter equipped with the FreeCAD software, accessed via its Python API. Our framework is able to assess the impact of generated CAD commands on geometry and adapts subsequent actions based on the evolving state of the CAD design. We consider a wide range of CAD-specific tools including Python libraries, modules of the FreeCAD Python API, helpful routines, rendering functions and other specialized modules. We evaluate our method on multiple CAD benchmarks and qualitatively demonstrate the potential of tool-augmented VLLMs as generic CAD task solvers across diverse CAD workflows.
Problem

Research questions and friction points this paper is trying to address.

Develops a general-purpose CAD agent for AI-assisted design
Addresses multimodal user queries with iterative CAD command execution
Evaluates and outperforms existing methods on CAD benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision and Large Language Model for CAD planning
Tool-augmentation with CAD-specific Python tools
Iterative command execution via FreeCAD API
🔎 Similar Papers
No similar papers found.
D
Dimitrios Mallis
SnT, University of Luxembourg
Ahmet Serdar Karadeniz
Ahmet Serdar Karadeniz
SnT, University of Luxembourg
S
Sebastian Cavada
Danila Rukhovich
Danila Rukhovich
University of Luxembourg
deep learningcomputer vision3d scene understanding
N
N. Foteinopoulou
SnT, University of Luxembourg
K
K. Cherenkova
SnT, University of Luxembourg, Artec3D, Luxembourg
Anis Kacem
Anis Kacem
Research Scientist in Computer Vision, University of Luxembourg, SnT
Computer VisionPattern RecognitionMachine Learning
D
D. Aouada
SnT, University of Luxembourg