Exploring Metric Fusion for Evaluation of NeRFs

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Current NeRF image quality assessment lacks a single, cross-dataset robust metric, as NeRF-specific artifacts cause mainstream metrics to poorly correlate with human perception. To address this, we propose the first multi-metric evaluation framework integrating DISTS—based on deep feature similarity—and VMAF—based on multi-scale visual fidelity—systematically exploring various normalization strategies and linear/nonlinear fusion methods to establish an end-to-end assessment pipeline. Our key contribution is the novel synergistic integration of DISTS and VMAF for NeRF quality evaluation. Extensive validation on Synthetic and Outdoor datasets demonstrates that all three fusion configurations significantly outperform either individual metric, achieving an average SROCC improvement of 0.12. Moreover, the fused metrics exhibit enhanced cross-dataset generalizability and stronger consistency with subjective scores (measured by SROCC and PCC), establishing a more reliable and transferable benchmark for NeRF evaluation.

Technology Category

Application Category

📝 Abstract

Neural Radiance Fields (NeRFs) have demonstrated significant potential in synthesizing novel viewpoints. Evaluating the NeRF-generated outputs, however, remains a challenge due to the unique artifacts they exhibit, and no individual metric performs well across all datasets. We hypothesize that combining two successful metrics, Deep Image Structure and Texture Similarity (DISTS) and Video Multi-Method Assessment Fusion (VMAF), based on different perceptual methods, can overcome the limitations of individual metrics and achieve improved correlation with subjective quality scores. We experiment with two normalization strategies for the individual metrics and two fusion strategies to evaluate their impact on the resulting correlation with the subjective scores. The proposed pipeline is tested on two distinct datasets, Synthetic and Outdoor, and its performance is evaluated across three different configurations. We present a detailed analysis comparing the correlation coefficients of fusion methods and individual scores with subjective scores to demonstrate the robustness and generalizability of the fusion metrics.

Problem

Research questions and friction points this paper is trying to address.

Evaluating NeRF outputs with unique artifacts

Combining DISTS and VMAF metrics for better correlation

Testing fusion strategies on Synthetic and Outdoor datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining DISTS and VMAF metrics

Testing two normalization strategies

Evaluating two fusion strategies

🔎 Similar Papers

Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis