🤖 AI Summary
Traditional distortion metrics such as PSNR struggle to capture human preferences regarding the semantic interpretability of metalens imaging, thereby limiting the optimization of reconstruction models. This work addresses this gap by redefining image quality through the lens of human semantic interpretability and introduces a human-in-the-loop active ranking framework. The proposed approach integrates lightweight semantic priors from vision-language models, an uncertainty-aware query strategy, and a probabilistic preference model to efficiently construct perceptually aligned quality rankings. Evaluated on both real and synthetic datasets, the method achieves significantly better alignment with human judgments than conventional metrics, requiring only approximately 20% of pairwise annotations to attain high-ranking consistency with human assessment.
📝 Abstract
Image quality in modern imaging systems emerges from the coupled effects of the sensor, optics, and computational reconstruction. Ultra-thin metalenses offer a path toward substantial miniaturization of optical modules, but practical designs often exhibit pronounced chromatic and field-dependent aberrations that necessitate computational reconstruction. In current metalens pipelines, reconstruction models are commonly trained and selected using distortion-based fidelity objectives, such as PSNR, yet these proxies can be weakly correlated with human preference and downstream utility, reflecting the well-known perception--distortion trade-off. We introduce MetaRanker, a human-in-the-loop active ranking framework that formalizes metalens image quality in terms of semantic interpretability, defined as the degree to which humans can reliably recognize objects and structures in the presence of optical artifacts. MetaRanker combines a probabilistic preference model with uncertainty-aware query selection, and leverages vision--language models to provide lightweight semantic priors. Importantly, these priors are used only to guide the sampling of informative comparisons; human judgments remain the primary supervision signal throughout. Across real-world and synthetic metalens datasets with distinct degradation profiles, MetaRanker produces rankings that align most closely with human assessments, while reducing the number of pairwise annotations required by approximately 80% relative to exhaustive pairwise evaluation. Finally, we show that standard image quality assessment metrics exhibit limited alignment with human interpretability in the metalens domain, positioning MetaRanker as a practical step toward perceptually grounded metalens evaluation and co-design.