Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects

📅 2025-04-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-to-3D evaluation metrics (e.g., PSNR, CLIP) either rely on ground-truth references or solely measure text fidelity, failing to comprehensively and reference-free assess generative quality. To address this, we propose the first automatic, reference-free evaluation framework for text-generated 3D objects. Our method fine-tunes a vision-based large language model (vLLM) and employs multi-scale rendered 3D normal maps as input to jointly model alignment among textual semantics, geometric structure, and visual appearance. It is the first to integrate text–geometry–vision multimodal semantic understanding into a unified large-model evaluation pipeline, enhanced by geometry-aware prompt engineering for fine-grained quality discrimination. Extensive user preference studies demonstrate that our framework significantly outperforms PSNR, CLIP, and general-purpose multimodal models, establishing a new benchmark for text-to-3D evaluation.

Technology Category

Application Category

📝 Abstract
Rapid advancements in text-to-3D generation require robust and scalable evaluation metrics that align closely with human judgment, a need unmet by current metrics such as PSNR and CLIP, which require ground-truth data or focus only on prompt fidelity. To address this, we introduce Gen3DEval, a novel evaluation framework that leverages vision large language models (vLLMs) specifically fine-tuned for 3D object quality assessment. Gen3DEval evaluates text fidelity, appearance, and surface quality by analyzing 3D surface normals, without requiring ground-truth comparisons, bridging the gap between automated metrics and user preferences. Compared to state-of-the-art task-agnostic models, Gen3DEval demonstrates superior performance in user-aligned evaluations, placing it as a comprehensive and accessible benchmark for future research on text-to-3D generation. The project page can be found here: href{https://shalini-maiti.github.io/gen3deval.github.io/}{https://shalini-maiti.github.io/gen3deval.github.io/}.
Problem

Research questions and friction points this paper is trying to address.

Develop robust metrics for text-to-3D generation evaluation
Align automated 3D object assessment with human judgment
Evaluate text fidelity, appearance, and surface quality without ground-truth data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses vLLMs for 3D object quality assessment
Analyzes 3D surface normals without ground-truth
Aligns automated metrics with user preferences
🔎 Similar Papers
No similar papers found.