PathVG: A New Benchmark and Dataset for Pathology Visual Grounding

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current computational pathology methods suffer from two key limitations: nucleus segmentation relies on predefined categories, and pathological visual question answering lacks region localization capability. To address these, this work introduces PathVG—the first pathological visual grounding benchmark—alongside the RefPath dataset (27,610 images, 33,500 referring expressions with bounding-box annotations). It pioneers the integration of visual grounding into computational pathology by proposing a multi-scale region detection framework infused with domain-specific pathological knowledge. We design a Knowledge Fusion Module (KFM) that aligns large language model (LLM)-derived pathological semantics with visual features, and integrate it with a Pathological Knowledge-enhanced Network (PKNet) and multi-scale feature modeling. Evaluated on PathVG, our method achieves state-of-the-art performance, significantly improving localization accuracy for implicitly expressed medical knowledge.

Technology Category

Application Category

📝 Abstract
With the rapid development of computational pathology, many AI-assisted diagnostic tasks have emerged. Cellular nuclei segmentation can segment various types of cells for downstream analysis, but it relies on predefined categories and lacks flexibility. Moreover, pathology visual question answering can perform image-level understanding but lacks region-level detection capability. To address this, we propose a new benchmark called Pathology Visual Grounding (PathVG), which aims to detect regions based on expressions with different attributes. To evaluate PathVG, we create a new dataset named RefPath which contains 27,610 images with 33,500 language-grounded boxes. Compared to visual grounding in other domains, PathVG presents pathological images at multi-scale and contains expressions with pathological knowledge. In the experimental study, we found that the biggest challenge was the implicit information underlying the pathological expressions. Based on this, we proposed Pathology Knowledge-enhanced Network (PKNet) as the baseline model for PathVG. PKNet leverages the knowledge-enhancement capabilities of Large Language Models (LLMs) to convert pathological terms with implicit information into explicit visual features, and fuses knowledge features with expression features through the designed Knowledge Fusion Module (KFM). The proposed method achieves state-of-the-art performance on the PathVG benchmark.
Problem

Research questions and friction points this paper is trying to address.

Develops PathVG for region detection in pathology images
Creates RefPath dataset with 27,610 images for evaluation
Proposes PKNet to enhance pathological knowledge in visual grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes PathVG benchmark for pathology visual grounding
Introduces PKNet with knowledge-enhanced LLMs integration
Develops RefPath dataset with multi-scale pathological images
🔎 Similar Papers
No similar papers found.
Chunlin Zhong
Chunlin Zhong
Huazhong University of Science and Technology
conputer vision
S
Shuang Hao
School of Software Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
J
Junhua Wu
Department of Pathology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
X
Xiaona Chang
Department of Pathology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
Jiwei Jiang
Jiwei Jiang
Huazhong University Of Science And Technology
Computer Vision
X
Xiu Nie
Department of Pathology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
He Tang
He Tang
Huazhong University of Science and Technology
Computer VisionMachine LearingMedical Image Analysis
Xiang Bai
Xiang Bai
Huazhong University of Science and Technology (HUST)
Computer VisionOCR