🤖 AI Summary
This work addresses the limitations of conventional immunohistochemical staining—namely its time-consuming, costly nature and tissue-destructive process—as well as the inability of existing virtual staining methods to simultaneously preserve fine cellular structures and accurately represent biochemical expression. To overcome these challenges, the authors propose HistDiT, a structure-aware latent conditional diffusion Transformer that integrates spatial morphology and semantic phenotypic information through a dual-stream conditioning mechanism. Key innovations include a dual-stream strategy that jointly guides structural fidelity and semantic accuracy, a multi-objective loss function to enhance morphological clarity, and a Structure Consistency Metric (SCM) tailored to evaluate diagnostically relevant morphological features. Experimental results demonstrate that HistDiT significantly outperforms current approaches in both quantitative and qualitative assessments, producing high-fidelity images with realistic staining textures and preserved cellular architecture suitable for pathological diagnosis.
📝 Abstract
Immunohistochemistry (IHC) is essential for assessing specific immune biomarkers like Human Epidermal growth-factor Receptor 2 (HER2) in breast cancer. However, the traditional protocols of obtaining IHC stains are resource-intensive, time-consuming, and prone to structural damages. Virtual staining has emerged as a scalable alternative, but it faces significant challenges in preserving fine-grained cellular structures while accurately translating biochemical expressions. Current state-of-the-art methods still rely on Generative Adversarial Networks (GANs) or standard convolutional U-Net diffusion models that often struggle with "structure and staining trade-offs". The generated samples are either structurally relevant but blurry, or texturally realistic but have artifacts that compromise their diagnostic use. In this paper, we introduce HistDiT, a novel latent conditional Diffusion Transformer (DiT) architecture that establishes a new benchmark for visual fidelity in virtual histological staining. The novelty introduced in this work is, a) the Dual-Stream Conditioning strategy that explicitly maintains a balance between spatial constraints via VAE-encoded latents and semantic phenotype guidance via UNI embeddings; b) the multi-objective loss function that contributes to sharper images with clear morphological structure; and c) the use of the Structural Correlation Metric (SCM) to focus on the core morphological structure for precise assessment of sample quality. Consequently, our model outperforms existing baselines, as demonstrated through rigorous quantitative and qualitative evaluations.