🤖 AI Summary
Fine-grained classification of brain tumor subtypes is challenged by subtle morphological differences and scarce annotated data, severely limiting zero-shot generalization. Method: We propose the Fine-grained Image Patch–Text Alignment (FIPA) network, which enhances discriminative representation of critical pathological regions via a local feature refinement module and leverages large language models to generate pathology-aware, fine-grained textual prototypes—enabling joint optimization of visual and semantic spaces. Unlike prevailing vision-language models relying on coarse-grained semantics, FIPA explicitly encodes interpretable histopathological features—e.g., tissue architecture and cellular atypia. Contribution/Results: Evaluated on multi-center datasets (EBRAINS, TCGA), FIPA achieves state-of-the-art zero-shot classification accuracy, demonstrates strong cross-dataset generalizability, and provides clinically interpretable predictions grounded in established pathological criteria.
📝 Abstract
The fine-grained classification of brain tumor subtypes from histopathological whole slide images is highly challenging due to subtle morphological variations and the scarcity of annotated data. Although vision-language models have enabled promising zero-shot classification, their ability to capture fine-grained pathological features remains limited, resulting in suboptimal subtype discrimination. To address these challenges, we propose the Fine-Grained Patch Alignment Network (FG-PAN), a novel zero-shot framework tailored for digital pathology. FG-PAN consists of two key modules: (1) a local feature refinement module that enhances patch-level visual features by modeling spatial relationships among representative patches, and (2) a fine-grained text description generation module that leverages large language models to produce pathology-aware, class-specific semantic prototypes. By aligning refined visual features with LLM-generated fine-grained descriptions, FG-PAN effectively increases class separability in both visual and semantic spaces. Extensive experiments on multiple public pathology datasets, including EBRAINS and TCGA, demonstrate that FG-PAN achieves state-of-the-art performance and robust generalization in zero-shot brain tumor subtype classification.