🤖 AI Summary
Current 3D generative systems struggle to simultaneously achieve geometric fidelity, semantic coherence, and visual quality. Moreover, mainstream evaluation metrics either neglect geometric attributes or rely on opaque multimodal large language models, lacking fine-grained interpretability and alignment with human perceptual judgment. To address this, we propose the first multi-foundation-model collaborative probing framework specifically designed for 3D generative content assessment. Our framework integrates multi-view rendering, cross-modal feature alignment, geometric consistency verification, and specialized analyzers—including CLIP, depth estimation, SAM, and mesh-based modules—to deliver pixel-level quantification and 3D spatial feedback. The method significantly enhances interpretability and human visual alignment, accurately identifying semantic-geometric inconsistencies across leading 3D generation models. Empirically, it achieves a 32% improvement in Spearman correlation with human judgments over baseline metrics.
📝 Abstract
Despite the unprecedented progress in the field of 3D generation, current systems still often fail to produce high-quality 3D assets that are visually appealing and geometrically and semantically consistent across multiple viewpoints. To effectively assess the quality of the generated 3D data, there is a need for a reliable 3D evaluation tool. Unfortunately, existing 3D evaluation metrics often overlook the geometric quality of generated assets or merely rely on black-box multimodal large language models for coarse assessment. In this paper, we introduce Eval3D, a fine-grained, interpretable evaluation tool that can faithfully evaluate the quality of generated 3D assets based on various distinct yet complementary criteria. Our key observation is that many desired properties of 3D generation, such as semantic and geometric consistency, can be effectively captured by measuring the consistency among various foundation models and tools. We thus leverage a diverse set of models and tools as probes to evaluate the inconsistency of generated 3D assets across different aspects. Compared to prior work, Eval3D provides pixel-wise measurement, enables accurate 3D spatial feedback, and aligns more closely with human judgments. We comprehensively evaluate existing 3D generation models using Eval3D and highlight the limitations and challenges of current models.