🤖 AI Summary
Existing methods struggle to handle multimodal 3D CAD editing requests posed by professional designers in real-world industrial settings. This work introduces the first multimodal instruction benchmark tailored for expert-level CAD editing, constructed by recording videos of designers simultaneously narrating, annotating, and performing edits, thereby capturing authentic tasks involving speech, gestures, sketching, and screen interactions—surpassing the limitations of purely text-based conditioning. Leveraging this dataset, we evaluate the performance gap between state-of-the-art foundation models (GPT-5.2) and human experts: in human acceptability tests, the model lags behind experts by 53% (absolute), underscoring the task’s difficulty and establishing the first realistic, industry-aligned evaluation benchmark for future research.
📝 Abstract
We introduce neuralCAD-Edit, the first benchmark for editing 3D CAD models collected from expert CAD engineers. Instead of text conditioning as in prior works, we collect realistic CAD editing requests by capturing videos of professional designers, interacting directly with CAD models in CAD software, while talking, pointing and drawing. We recruited ten consenting designers to contribute to this contained study. We benchmark leading foundation models against human CAD experts carrying out edits, and find a large performance gap in both automatic metrics and human evaluations. Even the best foundation model (GPT 5.2) scores 53% lower (absolute) than CAD experts in human acceptance trials, demonstrating the challenge of neuralCAD-Edit. We hope neuralCAD-Edit will provide a solid foundation against which 3D CAD editing approaches and foundation models can be developed. Code/data: https://autodeskailab.github.io/neuralCAD-Edit