MediSee: Reasoning-based Pixel-level Perception in Medical Images

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical image segmentation methods rely heavily on precise bounding boxes or domain-specific textual prompts, exhibiting poor generalizability to natural, non-expert user queries. Method: We propose Medical Semantic Segmentation and Detection (MedSD), a novel task requiring pixel-level segmentation and object localization solely from colloquial, logic-implicit natural language queries. To this end, we formally define the MedSD task; introduce MLMR-SD, a multi-perspective logic-driven dataset enabling annotation-free, box-free, and terminology-free input; and design an end-to-end architecture integrating vision-language understanding, logical reasoning modeling, and multi-granularity localization decoding, augmented by an implicit-reasoning prompt-guided cross-modal alignment mechanism. Contribution/Results: Extensive experiments demonstrate that our method significantly outperforms conventional referring segmentation approaches on MedSD, validating its robust comprehension of lay-user utterances and accurate pixel-level localization—without requiring expert annotations or structured prompts.

Technology Category

Application Category

📝 Abstract
Despite remarkable advancements in pixel-level medical image perception, existing methods are either limited to specific tasks or heavily rely on accurate bounding boxes or text labels as input prompts. However, the medical knowledge required for input is a huge obstacle for general public, which greatly reduces the universality of these methods. Compared with these domain-specialized auxiliary information, general users tend to rely on oral queries that require logical reasoning. In this paper, we introduce a novel medical vision task: Medical Reasoning Segmentation and Detection (MedSD), which aims to comprehend implicit queries about medical images and generate the corresponding segmentation mask and bounding box for the target object. To accomplish this task, we first introduce a Multi-perspective, Logic-driven Medical Reasoning Segmentation and Detection (MLMR-SD) dataset, which encompasses a substantial collection of medical entity targets along with their corresponding reasoning. Furthermore, we propose MediSee, an effective baseline model designed for medical reasoning segmentation and detection. The experimental results indicate that the proposed method can effectively address MedSD with implicit colloquial queries and outperform traditional medical referring segmentation methods.
Problem

Research questions and friction points this paper is trying to address.

Overcoming reliance on specialized medical input for image analysis
Enabling logical reasoning for medical image segmentation and detection
Addressing implicit colloquial queries in medical vision tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-perspective logic-driven medical reasoning dataset
MediSee model for implicit colloquial queries
Segmentation and detection without domain-specialized inputs
🔎 Similar Papers
No similar papers found.
Q
Qinyue Tong
Zhejiang University, Hangzhou, Zhejiang, China
Ziqian Lu
Ziqian Lu
Zhejiang University;Zhejiang Sci-Tech University
Zero-Shot LearningMulti-modalLLMContrastive Learning
J
Jun Liu
Zhejiang University, Hangzhou, Zhejiang, China
Y
Yangming Zheng
Zhejiang University, Hangzhou, Zhejiang, China
Z
Zheming Lu
Zhejiang University, Hangzhou, Zhejiang, China