GRASP-PsONet: Gradient-based Removal of Spurious Patterns for PsOriasis Severity Classification

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

In psoriasis clinical trials, automatic severity scoring from remote smartphone images is vulnerable to spurious correlations induced by confounding factors—including variable lighting, heterogeneous backgrounds, device-specific artifacts, and inter-rater annotation inconsistencies. To address this, we propose an unsupervised training-sample diagnosis method grounded in gradient-based interpretability—enabling precise identification of pseudo-patterns and annotation-conflict samples without additional labels. Our approach integrates a ConvNeXT-based weakly supervised architecture with gradient溯源 analysis to support counterfactual attribution for misclassified samples and automated flagging of problematic images. Removing only 8.2% of low-quality or high-conflict samples improves test-set AUC-ROC by 5 percentage points (85% → 90%). On a dual-physician-annotated subset, the top 30% highest-risk samples identified by our method encompass over 90% of annotation disagreements—demonstrating substantial gains in model robustness and clinical reliability.

Technology Category

Application Category

📝 Abstract

Psoriasis (PsO) severity scoring is important for clinical trials but is hindered by inter-rater variability and the burden of in person clinical evaluation. Remote imaging using patient captured mobile photos offers scalability but introduces challenges, such as variation in lighting, background, and device quality that are often imperceptible to humans but can impact model performance. These factors, along with inconsistencies in dermatologist annotations, reduce the reliability of automated severity scoring. We propose a framework to automatically flag problematic training images that introduce spurious correlations which degrade model generalization, using a gradient based interpretability approach. By tracing the gradients of misclassified validation images, we detect training samples where model errors align with inconsistently rated examples or are affected by subtle, nonclinical artifacts. We apply this method to a ConvNeXT based weakly supervised model designed to classify PsO severity from phone images. Removing 8.2% of flagged images improves model AUC-ROC by 5% (85% to 90%) on a held out test set. Commonly, multiple annotators and an adjudication process ensure annotation accuracy, which is expensive and time consuming. Our method detects training images with annotation inconsistencies, potentially removing the need for manual review. When applied to a subset of training data rated by two dermatologists, the method identifies over 90% of cases with inter-rater disagreement by reviewing only the top 30% of samples. This improves automated scoring for remote assessments, ensuring robustness despite data collection variability.

Problem

Research questions and friction points this paper is trying to address.

Addresses unreliable psoriasis severity scoring from mobile photos

Detects spurious correlations in training data degrading model performance

Identifies annotation inconsistencies without manual review

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient-based detection of spurious training images

Weakly supervised ConvNeXT for severity classification

Automated flagging of inconsistent dermatologist annotations

🔎 Similar Papers

No similar papers found.