BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing industrial-scale procedural CAD generation models lack a unified, real-world evaluation benchmark, making it difficult to comprehensively assess their capabilities in geometric understanding, parameter inference, and executable program synthesis. This work proposes BenchCAD—the first comprehensive benchmark tailored for real-world industrial CAD reasoning—comprising 17,900 executable CadQuery programs across 106 part families, and supporting multimodal evaluations including visual question answering, code-based question answering, image-to-code generation, and instruction-guided editing. Systematic evaluation on this benchmark reveals that while current multimodal large language models can reconstruct coarse geometries, they exhibit significant shortcomings in parameter abstraction, fine-grained structural comprehension, and critical operations such as sweeping and lofting. Although fine-tuning improves in-domain performance, generalization across diverse part families remains limited.

📝 Abstract

Industrial Computer-Aided Design (CAD) code generation requires models to produce executable parametric programs from visual or textual inputs. Beyond recognizing the outer shape of a part, this task involves understanding its 3D structure, inferring engineering parameters, and choosing CAD operations that reflect how the part would be designed and manufactured. Despite the promise of Multimodal large language models (MLLMs) for this task, they are rarely evaluated on whether these capabilities jointly hold in realistic industrial CAD settings. We present BenchCAD, a unified benchmark for industrial CAD reasoning. BenchCAD contains 17,900 execution-verified CadQuery programs across 106 industrial part families, including bevel gears, compression springs, twist drills, and other reusable engineering designs. It evaluates models through visual question answering, code question answering, image-to-code generation, and instruction-guided code editing, enabling fine-grained analysis across perception, parametric abstraction, and executable program synthesis. Across 10+ frontier models, BenchCAD shows that current systems often recover coarse outer geometry but fail to produce faithful parametric CAD programs. Common failures include missing fine 3D structure, misinterpreting industrial design parameters, and replacing essential operations such as sweeps, lofts, and twist-extrudes with simpler sketch-and-extrude patterns. Fine-tuning and reinforcement learning improve in-distribution performance, but generalization to unseen part families remains limited. These results position BenchCAD as a benchmark for measuring and improving the industrial readiness of multimodal CAD automation.

Problem

Research questions and friction points this paper is trying to address.

industrial CAD

programmatic CAD

parametric modeling

multimodal LLMs

executable code generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

BenchCAD

programmatic CAD

multimodal LLMs