Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current 3D generative systems struggle to simultaneously achieve geometric fidelity, semantic coherence, and visual quality. Moreover, mainstream evaluation metrics either neglect geometric attributes or rely on opaque multimodal large language models, lacking fine-grained interpretability and alignment with human perceptual judgment. To address this, we propose the first multi-foundation-model collaborative probing framework specifically designed for 3D generative content assessment. Our framework integrates multi-view rendering, cross-modal feature alignment, geometric consistency verification, and specialized analyzers—including CLIP, depth estimation, SAM, and mesh-based modules—to deliver pixel-level quantification and 3D spatial feedback. The method significantly enhances interpretability and human visual alignment, accurately identifying semantic-geometric inconsistencies across leading 3D generation models. Empirically, it achieves a 32% improvement in Spearman correlation with human judgments over baseline metrics.

Technology Category

Application Category

📝 Abstract
Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often overlook the geometric quality of generated assets or merely rely on black-box multimodal large language models for coarse assessment. In this paper, we introduce Eval3D, a fine-grained, interpretable evaluation tool that can faithfully evaluate the quality of generated 3D assets based on various distinct yet complementary criteria. Our key observation is that many desired properties of 3D generation, such as semantic and geometric consistency, can be effectively captured by measuring the consistency among various foundation models and tools. We thus leverage a diverse set of models and tools as probes to evaluate the inconsistency of generated 3D assets across different aspects. Compared to prior work, Eval3D provides pixel-wise measurement, enables accurate 3D spatial feedback, and aligns more closely with human judgments. We comprehensively evaluate existing 3D generation models using Eval3D and highlight the limitations and challenges of current models.
Problem

Research questions and friction points this paper is trying to address.

Assessing quality of 3D assets for visual and geometric consistency
Developing reliable fine-grained evaluation metrics for 3D generation
Overcoming limitations of black-box models in 3D asset assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages diverse foundation models for consistency evaluation
Provides pixel-wise measurement and spatial feedback
Aligns evaluation closely with human judgments
🔎 Similar Papers
No similar papers found.