🤖 AI Summary
Existing general-purpose image similarity metrics—such as SSIM, PSNR, and LPIPS—struggle to accurately evaluate virtual staining results in histopathology because they overlook domain-specific characteristics like tissue morphology and biomarker expression. This work addresses this gap by introducing the first perceptual similarity benchmark tailored to histopathology and proposing the Histopathology-Aware Perceptual Similarity (HAPS) metric. HAPS leverages features extracted from a frozen pathology-pretrained encoder, computes feature-space distances via a linear aggregation head, and incorporates geometric deformation sensitivity analysis to align with expert assessments. Experiments demonstrate that HAPS correlates strongly with pathologists’ judgments and effectively identifies low-quality samples. Moreover, virtual staining models trained on the MIST dataset filtered using HAPS significantly outperform those trained on the original unfiltered data.
📝 Abstract
Virtual staining of histopathology images (e.g., H&E-IHC) is an emerging tool in digital pathology, enabling faster and cheaper workflows by synthesizing target stains from routinely acquired slides. Yet, the quality of virtual staining models is still predominantly assessed with generic metrics such as SSIM, PSNR, and LPIPS. Originally developed for natural images, these metrics are inherently misaligned with the domain-specific characteristics of histological data, failing to capture tissue morphology preservation and biomarker expression patterns. Consequently, a robust, domain-specific standard for quantifying similarity across diverse histological modalities remains a critical gap in the field. In this work, we formalize histology image similarity as a standalone problem and systematically evaluate a broad set of full-reference metrics against a dataset of H&E-IHC patch pairs annotated with expert similarity scores. We further analyze metrics sensitivity to controlled geometric distortions (shifts, rotations and non-rigid deformations) that mimic realistic registration errors between serial sections. Guided by these observations, we propose the Histology-Aware Perceptual Similarity (HAPS) metric. HAPS computes distances in the feature space of a frozen encoder pretrained on histopathology data, adding a linear head to aggregate feature-level differences into a final score that aligns with expert assessments. Finally, we demonstrate the practical value of HAPS for quality control of training data. By quantifying the similarity of training pairs in the MIST dataset and filtering low-scoring samples, we create a cleaner training set. Virtual staining models trained on this refined data outperform those trained on the original, unfiltered dataset.