When normalization hallucinates: unseen risks in AI-powered whole slide image processing

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a long-overlooked “hallucination” risk in AI-driven whole-slide image (WSI) normalization: models generate visually imperceptible yet structurally spurious artifacts on clinical histopathological images, severely compromising diagnostically critical features. To address the inability of conventional evaluation metrics to detect such latent hallucinations, we propose a novel hallucination quantification metric based on deep feature contrast. We further conduct the first systematic retraining and evaluation of mainstream normalization models on real-world clinical WSI datasets. Results reveal pervasive hallucination across multiple established methods, while traditional metrics—including MSE and SSIM—fail entirely to capture these distortions. Our study establishes a new paradigm for hallucination assessment in WSI normalization and provides both a critical reliability warning and technical foundation for preprocessing pipelines in computational pathology.

Technology Category

Application Category

📝 Abstract
Whole slide image (WSI) normalization remains a vital preprocessing step in computational pathology. Increasingly driven by deep learning, these models learn to approximate data distributions from training examples. This often results in outputs that gravitate toward the average, potentially masking diagnostically important features. More critically, they can introduce hallucinated content, artifacts that appear realistic but are not present in the original tissue, posing a serious threat to downstream analysis. These hallucinations are nearly impossible to detect visually, and current evaluation practices often overlook them. In this work, we demonstrate that the risk of hallucinations is real and underappreciated. While many methods perform adequately on public datasets, we observe a concerning frequency of hallucinations when these same models are retrained and evaluated on real-world clinical data. To address this, we propose a novel image comparison measure designed to automatically detect hallucinations in normalized outputs. Using this measure, we systematically evaluate several well-cited normalization methods retrained on real-world data, revealing significant inconsistencies and failures that are not captured by conventional metrics. Our findings underscore the need for more robust, interpretable normalization techniques and stricter validation protocols in clinical deployment.
Problem

Research questions and friction points this paper is trying to address.

AI normalization in pathology risks hallucinating realistic artifacts
Current methods fail to detect hallucinations in clinical data
Need robust techniques and validation to prevent diagnostic errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel image comparison measure detects hallucinations
Systematic evaluation reveals normalization method failures
Proposes robust interpretable techniques for clinical deployment
🔎 Similar Papers
No similar papers found.
K
Karel Moens
Barco NV, Beneluxpark 21, Kortrijk, Belgium
M
Matthew B. Blaschko
PSI KU Leuven, Belgium
Tinne Tuytelaars
Tinne Tuytelaars
KU Leuven - PSI, Belgium
computer visioncontinual learning
B
Bart Diricx
Barco NV, Beneluxpark 21, Kortrijk, Belgium
Jonas De Vylder
Jonas De Vylder
Barco NV, Beneluxpark 21, Kortrijk, Belgium
M
Mustafa Yousif
Michigan Medicine, Ann Arbor, MI USA