🤖 AI Summary
Addressing the critical challenge of automatically generating medical reports from histopathological whole-slide images (WSIs), this paper introduces PathGenIC—the first framework to incorporate multimodal in-context learning (ICL) for this task. Methodologically, PathGenIC builds upon a vision-language model architecture and jointly leverages WSIs and textual pathology reports. It dynamically retrieves semantically similar historical cases as contextual examples and integrates an adaptive feedback mechanism to refine report generation—requiring no additional annotations while enhancing modeling of complex histopathological features. Key contributions include: (1) establishing the first retrieval-augmented, multimodal ICL paradigm specifically designed for pathology report generation; and (2) achieving significant improvements over prior work on the HistGen benchmark, with substantial gains in BLEU, METEOR, and ROUGE-L scores, alongside strong robustness across diverse disease types and varying report lengths.
📝 Abstract
Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers a solution for AI-driven histopathology reporting, setting a strong foundation for future advancements in multimodal clinical applications.