Open-source framework for detecting bias and overfitting for large pathology images

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the susceptibility of deep learning models for whole-slide image (WSI) analysis to non-semantic shortcuts—such as background color and brightness biases—leading to overfitting and poor generalization. We propose the first model-agnostic, lightweight, and plug-and-play framework for shortcut detection and diagnosis. Our method integrates gradient masking, perturbation sensitivity analysis, and self-supervised contrastive learning to enable interpretable, architecture- and task-agnostic bias identification. It operates efficiently on a single consumer-grade GPU. For the first time, we systematically uncover multiple novel, latent data shortcuts in foundational pathology models, while reproducing and extending known biases previously observed in self-supervised models. The open-source toolkit, released on GitHub, has been adopted by the community and demonstrably enhances model robustness and clinical applicability.

Technology Category

Application Category

📝 Abstract

Even foundational models that are trained on datasets with billions of data samples may develop shortcuts that lead to overfitting and bias. Shortcuts are non-relevant patterns in data, such as the background color or color intensity. So, to ensure the robustness of deep learning applications, there is a need for methods to detect and remove such shortcuts. Today's model debugging methods are time consuming since they often require customization to fit for a given model architecture in a specific domain. We propose a generalized, model-agnostic framework to debug deep learning models. We focus on the domain of histopathology, which has very large images that require large models - and therefore large computation resources. It can be run on a workstation with a commodity GPU. We demonstrate that our framework can replicate non-image shortcuts that have been found in previous work for self-supervised learning models, and we also identify possible shortcuts in a foundation model. Our easy to use tests contribute to the development of more reliable, accurate, and generalizable models for WSI analysis. Our framework is available as an open-source tool available on github.

Problem

Research questions and friction points this paper is trying to address.

Detecting bias and overfitting in large pathology images

Identifying non-relevant patterns causing model shortcuts

Providing a model-agnostic framework for debugging deep learning models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized model-agnostic debugging framework

Detects bias and overfitting in large pathology images

Open-source tool for reliable WSI analysis

🔎 Similar Papers

Bias Assessment and Data Drift Detection in Medical Image Analysis: A Survey