CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the lack of a unified benchmark for evaluating the performance of AI methods in recovering editable CAD programs from image or 3D observations. To this end, it introduces CADBench—the first comprehensive evaluation framework supporting multimodal inputs (e.g., single/multi-view images and photorealistic renderings), encompassing six families of CAD data, and integrating six metrics assessing both geometric fidelity and program quality. By consolidating multi-source STEP/B-rep data, stratifying samples by geometric complexity, and establishing a large-scale automated generation and validation pipeline, CADBench enables controlled analysis of model capabilities. Evaluation of 11 vision-language models—generating over 1.4 million CAD programs—reveals critical limitations in generalization across complexity levels, cross-modal transfer, and metric consistency, thereby establishing CADBench as a foundational benchmark for editable 3D reconstruction.

📝 Abstract

Recovering editable CAD programs from images or 3D observations is central to AI-assisted design, but progress is difficult to measure because existing evaluations are fragmented across datasets, modalities, and metrics. We introduce CADBench, a unified benchmark for multimodal CAD program generation. CADBench contains 18,000 evaluation samples spanning six benchmark families derived from DeepCAD, Fusion 360, ABC, MCB, and Objaverse; five input modalities including clean meshes, noisy meshes, single-view renders, photorealistic renders, and multi-view renders; and six metrics covering geometric fidelity, executability, and program compactness. STEP-based families are stratified by B-rep face count and all families are diversity-sampled to support controlled analysis across complexity and object variation. We benchmark eleven CAD-specialized and general-purpose vision-language systems, generating more than 1.4 million CAD programs. Under idealized inputs, specialized mesh-to-CAD models substantially outperform code-generating VLMs, which remain far from reliable CAD program reconstruction. CADBench further reveals three recurring failure modes: reconstruction quality degrades with geometric complexity, CAD-specialized models can be brittle under modality shift, and model rankings change across metrics. Together, these results position CADBench as a diagnostic testbed for measuring progress in editable 3D reconstruction and multimodal CAD understanding. The benchmark is publicly available at https://huggingface.co/datasets/DeCoDELab/CADBench.

Problem

Research questions and friction points this paper is trying to address.

CAD program generation

multimodal benchmark

editable 3D reconstruction

AI-assisted design

evaluation metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal benchmark

CAD program generation

editable 3D reconstruction