TWLR: Text-Guided Weakly-Supervised Lesion Localization and Severity Regression for Explainable Diabetic Retinopathy Grading

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the clinical bottlenecks of scarce pixel-level annotations and limited model interpretability in automated diabetic retinopathy (DR) grading, this paper proposes a two-stage weakly supervised framework. In Stage I, ophthalmology-domain knowledge encoded as text embeddings is fused with visual features to jointly perform DR grading and coarse lesion localization. In Stage II, text-guided iterative semantic segmentation coupled with invertible severity regression enables precise lesion localization and progressive visual degeneration modeling—from pathological to healthy states—without requiring pixel-level labels. Key innovations include domain-knowledge-enhanced multimodal vision-language modeling, inpainting-driven pathological feature erasure, and self-optimized saliency map generation. Evaluated on FGADR, DDR, and a private dataset, the method achieves state-of-the-art performance in both DR grading accuracy and lesion segmentation mIoU, significantly improving clinical trustworthiness and interpretability.

Technology Category

Application Category

📝 Abstract

Accurate medical image analysis can greatly assist clinical diagnosis, but its effectiveness relies on high-quality expert annotations Obtaining pixel-level labels for medical images, particularly fundus images, remains costly and time-consuming. Meanwhile, despite the success of deep learning in medical imaging, the lack of interpretability limits its clinical adoption. To address these challenges, we propose TWLR, a two-stage framework for interpretable diabetic retinopathy (DR) assessment. In the first stage, a vision-language model integrates domain-specific ophthalmological knowledge into text embeddings to jointly perform DR grading and lesion classification, effectively linking semantic medical concepts with visual features. The second stage introduces an iterative severity regression framework based on weakly-supervised semantic segmentation. Lesion saliency maps generated through iterative refinement direct a progressive inpainting mechanism that systematically eliminates pathological features, effectively downgrading disease severity toward healthier fundus appearances. Critically, this severity regression approach achieves dual benefits: accurate lesion localization without pixel-level supervision and providing an interpretable visualization of disease-to-healthy transformations. Experimental results on the FGADR, DDR, and a private dataset demonstrate that TWLR achieves competitive performance in both DR classification and lesion segmentation, offering a more explainable and annotation-efficient solution for automated retinal image analysis.

Problem

Research questions and friction points this paper is trying to address.

Automates diabetic retinopathy grading with text-guided lesion localization

Reduces reliance on costly pixel-level annotations in medical imaging

Enhances interpretability of deep learning models for clinical adoption

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language model integrates ophthalmological knowledge into text embeddings

Iterative severity regression framework uses weakly-supervised semantic segmentation

Progressive inpainting mechanism downgrades disease severity for interpretable visualization

🔎 Similar Papers

No similar papers found.