SRAM: Shape-Realism Alignment Metric for No Reference 3D Shape Evaluation

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Assessing the realism of 3D shapes without ground-truth references remains a fundamental challenge. To address this, we propose the first reference-free shape-realism alignment metric. Our core innovation lies in geometrically encoding 3D mesh structures into language token space, leveraging large language models (LLMs) to bridge low-level geometry and high-level human perceptual realism, and introducing a dedicated realism decoder for end-to-end score prediction. To support training and evaluation, we curate RealismGrading—the first publicly available dataset featuring human-annotated realism scores for diverse 3D shapes. Extensive experiments using k-fold cross-validation demonstrate strong agreement with human judgments (Spearman’s ρ > 0.85), significantly outperforming conventional metrics such as Chamfer Distance and Fréchet Inception Distance (FID). Moreover, our method exhibits robust cross-category generalization, confirming its effectiveness beyond domain-specific assumptions.

Technology Category

Application Category

📝 Abstract
3D generation and reconstruction techniques have been widely used in computer games, film, and other content creation areas. As the application grows, there is a growing demand for 3D shapes that look truly realistic. Traditional evaluation methods rely on a ground truth to measure mesh fidelity. However, in many practical cases, a shape's realism does not depend on having a ground truth reference. In this work, we propose a Shape-Realism Alignment Metric that leverages a large language model (LLM) as a bridge between mesh shape information and realism evaluation. To achieve this, we adopt a mesh encoding approach that converts 3D shapes into the language token space. A dedicated realism decoder is designed to align the language model's output with human perception of realism. Additionally, we introduce a new dataset, RealismGrading, which provides human-annotated realism scores without the need for ground truth shapes. Our dataset includes shapes generated by 16 different algorithms on over a dozen objects, making it more representative of practical 3D shape distributions. We validate our metric's performance and generalizability through k-fold cross-validation across different objects. Experimental results show that our metric correlates well with human perceptions and outperforms existing methods, and has good generalizability.
Problem

Research questions and friction points this paper is trying to address.

Evaluates 3D shape realism without ground truth reference
Aligns mesh shape information with human perception via LLM
Provides a dataset for realism scoring across diverse algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM bridges mesh shape to realism evaluation
Mesh encoding converts 3D shapes into language tokens
Realism decoder aligns LLM output with human perception
🔎 Similar Papers
No similar papers found.