Glo-VLMs: Leveraging Vision-Language Models for Fine-Grained Diseased Glomerulus Classification

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Glomerular lesion subtypes exhibit subtle morphological differences, and clinical annotations are extremely scarce—only eight samples per class—posing significant challenges for fine-grained renal pathology classification. Method: This paper proposes Glo-VLMs, the first framework to systematically investigate vision-language models (VLMs) for few-shot fine-grained renal histopathology classification. It employs joint image-text representation learning to explicitly align pathological visual patterns with domain-specific clinical terminology and introduces a medical-domain-aware few-shot fine-tuning strategy. Results: Evaluated on a real-world renal biopsy dataset under extreme few-shot settings, Glo-VLMs achieves 0.742 accuracy, 0.905 macro-AUC, and 0.528 F1-score—substantially outperforming existing methods. This work demonstrates the strong generalization capability of VLMs in low-resource, specialized medical diagnosis tasks and establishes a novel, terminology-driven paradigm for interpretable fine-grained pathological classification.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) have shown considerable potential in digital pathology, yet their effectiveness remains limited for fine-grained, disease-specific classification tasks such as distinguishing between glomerular subtypes. The subtle morphological variations among these subtypes, combined with the difficulty of aligning visual patterns with precise clinical terminology, make automated diagnosis in renal pathology particularly challenging. In this work, we explore how large pretrained VLMs can be effectively adapted to perform fine-grained glomerular classification, even in scenarios where only a small number of labeled examples are available. In this work, we introduce Glo-VLMs, a systematic framework designed to explore the adaptation of VLMs to fine-grained glomerular classification in data-constrained settings. Our approach leverages curated pathology images alongside clinical text prompts to facilitate joint image-text representation learning for nuanced renal pathology subtypes. By assessing various VLMs architectures and adaptation strategies under a few-shot learning paradigm, we explore how both the choice of method and the amount of labeled data impact model performance in clinically relevant scenarios. To ensure a fair comparison, we evaluate all models using standardized multi-class metrics, aiming to clarify the practical requirements and potential of large pretrained models for specialized clinical research applications. As a result, fine-tuning the VLMs achieved 0.7416 accuracy, 0.9045 macro-AUC, and 0.5277 F1-score with only 8 shots per class, demonstrating that even with highly limited supervision, foundation models can be effectively adapted for fine-grained medical image classification.
Problem

Research questions and friction points this paper is trying to address.

Adapting vision-language models for fine-grained glomerular disease classification
Addressing subtle morphological variations in renal pathology subtypes
Enabling accurate diagnosis with limited labeled medical image data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging vision-language models for fine-grained classification
Using curated images and text prompts for joint learning
Adapting VLMs with few-shot learning in data-constrained settings
🔎 Similar Papers
No similar papers found.
Z
Zhenhao Guo
New York University, New York, NY 10012, USA
Rachit Saluja
Rachit Saluja
Cornell University, Cornell Tech & Weill Cornell Medicine
Deep LearningAI for Medicine
Tianyuan Yao
Tianyuan Yao
Vanderbilt University
Machine Learningmedical image processing
Q
Quan Liu
Vanderbilt University, Nashville, TN, 37235, USA
Yuankai Huo
Yuankai Huo
Computer Science, Vanderbilt University
Medical Image AnalysisDeep LearningData Mining
Benjamin Liechty
Benjamin Liechty
Assistant Professor of Neuropathology, Weill-Cornell Medical College
neuropathologyneurooncologymachine learningcomputer visionmolecular pathology
D
David J. Pisapia
Weill Cornell Medicine, New York, NY 10065, USA
Kenji Ikemura
Kenji Ikemura
Weill Cornell Medicine - New York Presbyterian
Biomedical EngineeringMolecular PathologyClinical Informatics
Mert R. Sabuncu
Mert R. Sabuncu
Cornell University, Cornell Tech, Weill Cornell Medicine
AI for medical imagingmedical image computingmedical image analysismachine learning
Mert R. Sabuncu
Mert R. Sabuncu
Cornell University, Cornell Tech, Weill Cornell Medicine
AI for medical imagingmedical image computingmedical image analysismachine learning
Yihe Yang
Yihe Yang
Northwell Health
Renal PathologyAnatomic and Clinical PathologyEpidemiology and BiostatisticsNephrologyClinical Research
Yihe Yang
Yihe Yang
Northwell Health
Renal PathologyAnatomic and Clinical PathologyEpidemiology and BiostatisticsNephrologyClinical Research
Ruining Deng
Ruining Deng
Weill Cornell Medicine
Medical Image AnalysisDeep LearningDigital Pathology
Ruining Deng
Ruining Deng
Weill Cornell Medicine
Medical Image AnalysisDeep LearningDigital Pathology