Accurate and Scalable Multimodal Pathology Retrieval via Attentive Vision-Language Alignment

๐Ÿ“… 2025-10-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of cross-modal retrieval for whole-slide images (WSIs), this paper proposes PathSearch: a novel framework that models tissue microstructure via local mosaic representations and achieves fine-grained cross-modal alignment through global visionโ€“language contrastive learning, supporting both image-to-image and text-to-image retrieval. Methodologically, it innovatively integrates sliding-window patch encoding, radiology/pathology report text embedding, and attention-enhanced multimodal contrastive training. Evaluated on multiple public and internal multi-center datasets, PathSearch significantly improves retrieval accuracy for tumor subtyping and grading tasks. Clinical validation demonstrates enhanced diagnostic accuracy and improved inter-pathologist agreement. By providing an interpretable, scalable, and clinically grounded multimodal retrieval paradigm, PathSearch advances precision diagnosis and case-based pathology education.

Technology Category

Application Category

๐Ÿ“ Abstract
The rapid digitization of histopathology slides has opened up new possibilities for computational tools in clinical and research workflows. Among these, content-based slide retrieval stands out, enabling pathologists to identify morphologically and semantically similar cases, thereby supporting precise diagnoses, enhancing consistency across observers, and assisting example-based education. However, effective retrieval of whole slide images (WSIs) remains challenging due to their gigapixel scale and the difficulty of capturing subtle semantic differences amid abundant irrelevant content. To overcome these challenges, we present PathSearch, a retrieval framework that unifies fine-grained attentive mosaic representations with global-wise slide embeddings aligned through vision-language contrastive learning. Trained on a corpus of 6,926 slide-report pairs, PathSearch captures both fine-grained morphological cues and high-level semantic patterns to enable accurate and flexible retrieval. The framework supports two key functionalities: (1) mosaic-based image-to-image retrieval, ensuring accurate and efficient slide research; and (2) multi-modal retrieval, where text queries can directly retrieve relevant slides. PathSearch was rigorously evaluated on four public pathology datasets and three in-house cohorts, covering tasks including anatomical site retrieval, tumor subtyping, tumor vs. non-tumor discrimination, and grading across diverse organs such as breast, lung, kidney, liver, and stomach. External results show that PathSearch outperforms traditional image-to-image retrieval frameworks. A multi-center reader study further demonstrates that PathSearch improves diagnostic accuracy, boosts confidence, and enhances inter-observer agreement among pathologists in real clinical scenarios. These results establish PathSearch as a scalable and generalizable retrieval solution for digital pathology.
Problem

Research questions and friction points this paper is trying to address.

Retrieving gigapixel pathology slides with subtle semantic differences
Aligning fine-grained visual features with language for multimodal search
Improving diagnostic accuracy through content-based slide retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attentive vision-language alignment for pathology retrieval
Mosaic representations with slide embeddings for fine-grained analysis
Multimodal retrieval using text queries and image inputs
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Hongyi Wang
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
Z
Zhengjie Zhu
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
J
Jiabo Ma
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China
Fang Wang
Fang Wang
Postdoc, Stanford University
Reading acquisitiondyslexiacross-linguistic researchbilingualismcognitive neuroscience
Y
Yue Shi
Department of Pathology, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China
B
Bo Luo
Department of Pathology, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
J
Jili Wang
Department of Pathology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
Q
Qiuyu Cai
Department of Pathology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
X
Xiuming Zhang
Department of Pathology, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
Yen-Wei Chen
Yen-Wei Chen
Ritsumeikan University
image processingpattern recognitionmedical image analysis
L
Lanfen Lin
Zhejiang Key Laboratory of Multi-omics Precision Diagnosis and Treatment of Liver Diseases, Zhejiang University, Hangzhou 310063, China
H
Hao Chen
Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong SAR, China