Knowledge-Driven Vision-Language Model for Plexus Detection in Hirschsprung's Disease

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Current AI methods for identifying the myenteric plexus in Hirschsprung’s disease lack interpretability and clinical consensus. Method: We propose a multimodal vision-language model integrating medical prior knowledge—specifically, expert-validated textual concepts derived from biomedical literature—as semantic guidance within a CLIP-style contrastive learning framework; visual features are extracted from histopathological whole-slide images using QuiltNet, and prompt engineering is optimized via large language models. Contribution/Results: Our approach significantly enhances decision transparency and inter-pathologist diagnostic consistency. On a standard benchmark dataset, it achieves 83.9% accuracy, 86.6% precision, and 87.6% specificity—substantially outperforming conventional CNN baselines (e.g., VGG-19, ResNet-18/50). This work establishes a novel paradigm for interpretable, knowledge-informed AI in computational pathology.

Technology Category

Application Category

📝 Abstract

Hirschsprung's disease is defined as the congenital absence of ganglion cells in some segment(s) of the colon. The muscle cannot make coordinated movements to propel stool in that section, most commonly leading to obstruction. The diagnosis and treatment for this disease require a clear identification of different region(s) of the myenteric plexus, where ganglion cells should be present, on the microscopic view of the tissue slide. While deep learning approaches, such as Convolutional Neural Networks, have performed very well in this task, they are often treated as black boxes, with minimal understanding gained from them, and may not conform to how a physician makes decisions. In this study, we propose a novel framework that integrates expert-derived textual concepts into a Contrastive Language-Image Pre-training-based vision-language model to guide plexus classification. Using prompts derived from expert sources (e.g., medical textbooks and papers) generated by large language models and reviewed by our team before being encoded with QuiltNet, our approach aligns clinically relevant semantic cues with visual features. Experimental results show that the proposed model demonstrated superior discriminative capability across different classification metrics as it outperformed CNN-based models, including VGG-19, ResNet-18, and ResNet-50; achieving an accuracy of 83.9%, a precision of 86.6%, and a specificity of 87.6%. These findings highlight the potential of multi-modal learning in histopathology and underscore the value of incorporating expert knowledge for more clinically relevant model outputs.

Problem

Research questions and friction points this paper is trying to address.

Detecting myenteric plexus regions in Hirschsprung's disease

Integrating expert knowledge into vision-language models

Improving interpretability and clinical relevance of classifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates expert textual concepts into vision-language model

Uses large language models to generate medical prompts

Aligns clinical semantic cues with visual features

🔎 Similar Papers

VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge