Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation

📅 2025-04-25

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Current 3D generative systems struggle to simultaneously achieve geometric fidelity, semantic coherence, and visual quality. Moreover, mainstream evaluation metrics either neglect geometric attributes or rely on opaque multimodal large language models, lacking fine-grained interpretability and alignment with human perceptual judgment. To address this, we propose the first multi-foundation-model collaborative probing framework specifically designed for 3D generative content assessment. Our framework integrates multi-view rendering, cross-modal feature alignment, geometric consistency verification, and specialized analyzers—including CLIP, depth estimation, SAM, and mesh-based modules—to deliver pixel-level quantification and 3D spatial feedback. The method significantly enhances interpretability and human visual alignment, accurately identifying semantic-geometric inconsistencies across leading 3D generation models. Empirically, it achieves a 32% improvement in Spearman correlation with human judgments over baseline metrics.

Technology Category

Application Category

📝 Abstract

Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often overlook the geometric quality of generated assets or merely rely on black-box multimodal large language models for coarse assessment. In this paper, we introduce Eval3D, a fine-grained, interpretable evaluation tool that can faithfully evaluate the quality of generated 3D assets based on various distinct yet complementary criteria. Our key observation is that many desired properties of 3D generation, such as semantic and geometric consistency, can be effectively captured by measuring the consistency among various foundation models and tools. We thus leverage a diverse set of models and tools as probes to evaluate the inconsistency of generated 3D assets across different aspects. Compared to prior work, Eval3D provides pixel-wise measurement, enables accurate 3D spatial feedback, and aligns more closely with human judgments. We comprehensively evaluate existing 3D generation models using Eval3D and highlight the limitations and challenges of current models.

Problem

Research questions and friction points this paper is trying to address.

Assessing quality of 3D assets for visual and geometric consistency

Developing reliable fine-grained evaluation metrics for 3D generation

Overcoming limitations of black-box models in 3D asset assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages diverse foundation models for consistency evaluation

Provides pixel-wise measurement and spatial feedback

Aligns evaluation closely with human judgments

🔎 Similar Papers

No similar papers found.

Roblox

$185,860—$221,380 USD

San Mateo, CA, USA

Research Scientist Intern, Multimodal AI (PhD)