CTIS-QA: Clinical Template-Informed Slide-Level Question Answering for Pathology

📅 2025-12-15
🏛️ IEEE International Conference on Bioinformatics and Biomedicine
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of clinical grounding, standardized data, and diagnostic workflow alignment in existing pathological visual question answering (VQA) methods. To bridge this gap, the authors construct a Clinical Pathology Report Template (CPRT) based on the College of American Pathologists (CAP) cancer protocols, enabling systematic extraction of pathological features. They introduce CTIS-Bench—the first slide-level VQA benchmark clinically aligned with real-world diagnostic practices—and curate the CTIS-Align dataset to facilitate vision–language alignment training. Furthermore, they propose CTIS-QA, a dual-stream architecture that mimics pathologists’ “global screening–local focusing” visual strategy. Experiments demonstrate that CTIS-QA significantly outperforms state-of-the-art methods on WSI-VQA, CTIS-Bench, and multiple downstream diagnostic tasks.

Technology Category

Application Category

📝 Abstract
Multimodal large language models (MLLMs) have demonstrated strong performance in patch-level pathological image analysis; however, they often lack the holistic perceptual capability necessary for comprehensive Whole Slide Image (WSI) interpretation. Recent approaches have explored constructing slide-level MLLMs using VQA datasets that are entirely generated from pathology reports by large language models (LLMs). However, these datasets suffer from critical limitations: hallucinated content, information leakage in question stems, clinically irrelevant or visual independent questions, and the omission of essential diagnostic features-issues that undermine both data quality and clinical validity. In this paper, we introduce a clinical diagnosis template-based pipeline to collect pathological information. In collaboration with pathologists and guided by the the College of American Pathologists (CAP) Cancer Protocols, we design a Clinical Pathology Report Template (CPRT) that ensures comprehensive and standardized extraction of diagnostic elements from pathology reports. We validate the effectiveness of our pipeline on TCGA-BRCA. First, we extract pathological features from reports using CPRT. These features are then used to build CTIS-Align, a dataset of 80k slide-description pairs from 804 WSIs for vision-language alignment training, and CTISBench, a rigorously curated VQA benchmark comprising 977 WSIs and 14,879 question-answer pairs. CTIS-Bench emphasizes clinically grounded, closed-ended questions (e.g., tumor grade, receptor status) that reflect real diagnostic workflows, minimize non-visual reasoning, and require genuine slide understanding. We further propose CTIS-QA, a Slide-level Question Answering model, featuring a dual-stream architecture that mimics pathologists' diagnostic approach. One stream captures global slidelevel context via clustering-based feature aggregation, while the other focuses on salient local regions through attention-guided patch perception module. Extensive experiments on WSI-VQA, CTIS-Bench, and slide-level diagnostic tasks show that CTIS-QA consistently outperforms existing state-of-the-art models across multiple metrics. We will fully release both CTIS-Bench and CTIS-QA as open-source resources.
Problem

Research questions and friction points this paper is trying to address.

slide-level question answering
pathology
vision-language alignment
clinical diagnosis
whole-slide image
Innovation

Methods, ideas, or system contributions that make the work stand out.

Clinical Template
Slide-level Question Answering
Vision-Language Alignment
Dual-stream Architecture
Pathology Report Structuring
🔎 Similar Papers
Hao Lu
Hao Lu
Associate Professor, Huazhong University of Science and Technology
Computer VisionDeep LearningPlant Phenotyping
Z
Ziniu Qian
School of Biological Science and Medical Engineering, Beihang University, Beijing, China
Y
Yifu Li
School of Biological Science and Medical Engineering, Beihang University, Beijing, China
Y
Yang Zhou
School of Biological Science and Medical Engineering, Beihang University, Beijing, China
B
Bingzheng Wei
ByteDance Inc., Beijing, China
Yan Xu
Yan Xu
Professor of biological science and medical engineering, beihang university
machine learningdeep learningmedical imagingmedical information