🤖 AI Summary
This study addresses three key challenges in generating virtual immunohistochemistry (IHC) images from hematoxylin and eosin (H&E) slides: (1) lack of fair evaluation against unaligned real IHC data; (2) structural distortion; and (3) loss of biological variability. We propose Star-Diff, a structure-aware diffusion model that formulates virtual staining as a joint residual-noise recovery task, explicitly preserving tissue-level topology and cell-level semantics. To overcome the limitations of pixel-level metrics, we introduce the Semantic Fidelity Score (SFS), a biologically grounded evaluation metric quantifying clinical reliability via biomarker classification accuracy. On the BCI dataset, Star-Diff achieves state-of-the-art performance: high visual fidelity, superior diagnostic consistency with pathologists, and rapid inference (<1.5 s per slide). These advances significantly enhance the clinical feasibility and interpretability of intraoperative virtual IHC synthesis.
📝 Abstract
Hematoxylin and eosin (H&E) staining is the clinical standard for assessing tissue morphology, but it lacks molecular-level diagnostic information. In contrast, immunohistochemistry (IHC) provides crucial insights into biomarker expression, such as HER2 status for breast cancer grading, but remains costly and time-consuming, limiting its use in time-sensitive clinical workflows. To address this gap, virtual staining from H&E to IHC has emerged as a promising alternative, yet faces two core challenges: (1) Lack of fair evaluation of synthetic images against misaligned IHC ground truths, and (2) preserving structural integrity and biological variability during translation. To this end, we present an end-to-end framework encompassing both generation and evaluation in this work. We introduce Star-Diff, a structure-aware staining restoration diffusion model that reformulates virtual staining as an image restoration task. By combining residual and noise-based generation pathways, Star-Diff maintains tissue structure while modeling realistic biomarker variability. To evaluate the diagnostic consistency of the generated IHC patches, we propose the Semantic Fidelity Score (SFS), a clinical-grading-task-driven metric that quantifies class-wise semantic degradation based on biomarker classification accuracy. Unlike pixel-level metrics such as SSIM and PSNR, SFS remains robust under spatial misalignment and classifier uncertainty. Experiments on the BCI dataset demonstrate that Star-Diff achieves state-of-the-art (SOTA) performance in both visual fidelity and diagnostic relevance. With rapid inference and strong clinical alignment,it presents a practical solution for applications such as intraoperative virtual IHC synthesis.