🤖 AI Summary
This work addresses the lack of quantitative geometric and structural fidelity evaluation methods for 3D model generation by large language models (LLMs). We propose the first human-in-the-loop, multidimensional quantitative assessment framework. Methodologically, it integrates four core metrics—volumetric accuracy, surface alignment, dimensional fidelity, and topological complexity—and supports four input modalities: 2D orthographic views, isometric sketches, geometric structure trees, and code-level correction prompts, enhanced by multi-scale similarity and complexity measures. Our key contribution is the first interpretable, reproducible, quantitative evaluation system tailored specifically for CAD generation tasks, substantially outperforming conventional subjective visual assessment. In an L-bracket case study, code-level prompting achieved perfect reconstruction across all metrics, underscoring the critical role of semantic richness in generation quality. The framework significantly improves both evaluation efficiency and accuracy, thereby enabling democratized CAD design, reverse engineering, and rapid prototyping.
📝 Abstract
Large Language Models are increasingly capable of interpreting multimodal inputs to generate complex 3D shapes, yet robust methods to evaluate geometric and structural fidelity remain underdeveloped. This paper introduces a human in the loop framework for the quantitative evaluation of LLM generated 3D models, supporting applications such as democratization of CAD design, reverse engineering of legacy designs, and rapid prototyping. We propose a comprehensive suite of similarity and complexity metrics, including volumetric accuracy, surface alignment, dimensional fidelity, and topological intricacy, to benchmark generated models against ground truth CAD references. Using an L bracket component as a case study, we systematically compare LLM performance across four input modalities: 2D orthographic views, isometric sketches, geometric structure trees, and code based correction prompts. Our findings demonstrate improved generation fidelity with increased semantic richness, with code level prompts achieving perfect reconstruction across all metrics. A key contribution of this work is demonstrating that our proposed quantitative evaluation approach enables significantly faster convergence toward the ground truth, especially compared to traditional qualitative methods based solely on visual inspection and human intuition. This work not only advances the understanding of AI assisted shape synthesis but also provides a scalable methodology to validate and refine generative models for diverse CAD applications.