Human-in-the-Loop: Quantitative Evaluation of 3D Models Generation by Large Language Models

📅 2025-09-06

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the lack of quantitative geometric and structural fidelity evaluation methods for 3D model generation by large language models (LLMs). We propose the first human-in-the-loop, multidimensional quantitative assessment framework. Methodologically, it integrates four core metrics—volumetric accuracy, surface alignment, dimensional fidelity, and topological complexity—and supports four input modalities: 2D orthographic views, isometric sketches, geometric structure trees, and code-level correction prompts, enhanced by multi-scale similarity and complexity measures. Our key contribution is the first interpretable, reproducible, quantitative evaluation system tailored specifically for CAD generation tasks, substantially outperforming conventional subjective visual assessment. In an L-bracket case study, code-level prompting achieved perfect reconstruction across all metrics, underscoring the critical role of semantic richness in generation quality. The framework significantly improves both evaluation efficiency and accuracy, thereby enabling democratized CAD design, reverse engineering, and rapid prototyping.

Technology Category

Application Category

📝 Abstract

Large Language Models are increasingly capable of interpreting multimodal inputs to generate complex 3D shapes, yet robust methods to evaluate geometric and structural fidelity remain underdeveloped. This paper introduces a human in the loop framework for the quantitative evaluation of LLM generated 3D models, supporting applications such as democratization of CAD design, reverse engineering of legacy designs, and rapid prototyping. We propose a comprehensive suite of similarity and complexity metrics, including volumetric accuracy, surface alignment, dimensional fidelity, and topological intricacy, to benchmark generated models against ground truth CAD references. Using an L bracket component as a case study, we systematically compare LLM performance across four input modalities: 2D orthographic views, isometric sketches, geometric structure trees, and code based correction prompts. Our findings demonstrate improved generation fidelity with increased semantic richness, with code level prompts achieving perfect reconstruction across all metrics. A key contribution of this work is demonstrating that our proposed quantitative evaluation approach enables significantly faster convergence toward the ground truth, especially compared to traditional qualitative methods based solely on visual inspection and human intuition. This work not only advances the understanding of AI assisted shape synthesis but also provides a scalable methodology to validate and refine generative models for diverse CAD applications.

Problem

Research questions and friction points this paper is trying to address.

Evaluating geometric and structural fidelity of LLM-generated 3D models

Developing quantitative metrics for benchmarking against ground truth CAD

Comparing LLM performance across four different input modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-in-the-loop quantitative evaluation framework

Multi-metric assessment including volumetric accuracy

Code-based prompts enable perfect reconstruction

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA

Senior Research Engineer, Mechanical Intuition in Multimodal Models

Toyota Research Institute

Los Altos, CA / Cambridge, MA

Research Scientist Intern, Multimodal AI (PhD)