🤖 AI Summary
This study addresses the misalignment between automated metrics and human expert preferences in structural 3D reconstruction quality assessment. Methodologically, it introduces three key innovations: (1) an interpretable “unit testing” framework that systematically evaluates seven ideal metric properties, exposing critical flaws in mainstream metrics; (2) the first explanatory bridge linking metric behavior to expert preferences, achieved via expert behavioral modeling and sensitivity analysis to uncover failure mechanisms; and (3) a context-aware metric recommendation mechanism coupled with end-to-end expert score distillation, yielding a lightweight, learnable metric. Evaluated across three representative 3D reconstruction tasks, the distilled metric achieves Spearman correlations ≥0.92 with expert scores—substantially outperforming conventional metrics.
📝 Abstract
"What cannot be measured cannot be improved"while likely never uttered by Lord Kelvin, summarizes effectively the purpose of this work. This paper presents a detailed evaluation of automated metrics for evaluating structured 3D reconstructions. Pitfalls of each metric are discussed, and a thorough analyses through the lens of expert 3D modelers' preferences is presented. A set of systematic"unit tests"are proposed to empirically verify desirable properties, and context aware recommendations as to which metric to use depending on application are provided. Finally, a learned metric distilled from human expert judgments is proposed and analyzed.