Towards Fine-Grained Text-to-3D Quality Assessment: A Benchmark and A Two-Stage Rank-Learning Metric

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-to-3D (T23D) quality assessment suffers from two major bottlenecks: outdated and fragmented benchmarks, and evaluation metrics with poor robustness and insufficient feature representativeness. To address these, we introduce T23D-CompBench—the first fine-grained, comprehensive benchmark comprising 3,600 textured 3D meshes and 129,600 human judgments—and propose a five-category, twelve-dimensional evaluation framework. We further design Rank2Score, a two-stage learning-to-rank model that jointly leverages supervised contrastive regression and curriculum learning: Stage I models fine-grained relative preferences; Stage II regresses mean opinion scores (MOS). Rank2Score significantly improves alignment with human perception and, for the first time, enables differentiable, fine-grained quality reward signals for T23D generation. Experiments demonstrate consistent superiority over state-of-the-art methods across multiple metrics and validate its effectiveness in guiding generative model optimization.

Technology Category

Application Category

📝 Abstract
Recent advances in Text-to-3D (T23D) generative models have enabled the synthesis of diverse, high-fidelity 3D assets from textual prompts. However, existing challenges restrict the development of reliable T23D quality assessment (T23DQA). First, existing benchmarks are outdated, fragmented, and coarse-grained, making fine-grained metric training infeasible. Moreover, current objective metrics exhibit inherent design limitations, resulting in non-representative feature extraction and diminished metric robustness. To address these limitations, we introduce T23D-CompBench, a comprehensive benchmark for compositional T23D generation. We define five components with twelve sub-components for compositional prompts, which are used to generate 3,600 textured meshes from ten state-of-the-art generative models. A large-scale subjective experiment is conducted to collect 129,600 reliable human ratings across different perspectives. Based on T23D-CompBench, we further propose Rank2Score, an effective evaluator with two-stage training for T23DQA. Rank2Score enhances pairwise training via supervised contrastive regression and curriculum learning in the first stage, and subsequently refines predictions using mean opinion scores to achieve closer alignment with human judgments in the second stage. Extensive experiments and downstream applications demonstrate that Rank2Score consistently outperforms existing metrics across multiple dimensions and can additionally serve as a reward function to optimize generative models. The project is available at https://cbysjtu.github.io/Rank2Score/.
Problem

Research questions and friction points this paper is trying to address.

Establishing fine-grained quality assessment for text-to-3D generation models
Addressing outdated and coarse-grained benchmarks for 3D quality evaluation
Developing robust metrics that better align with human perceptual judgments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduced T23D-CompBench benchmark with compositional prompts
Proposed Rank2Score evaluator using two-stage training approach
Enhanced metric robustness via contrastive regression and curriculum learning
🔎 Similar Papers
No similar papers found.
B
Bingyang Cui
Shanghai Jiao Tong University, China
Yujie Zhang
Yujie Zhang
Shanghai Jiao tong University
3D Quality AssessmentGeometry Processing3D Reconstruction
Q
Qi Yang
University of Missouri-Kansas City, USA
Z
Zhu Li
University of Missouri-Kansas City, USA
Yiling Xu
Yiling Xu
Shanghai Jiaotong University