MediSee: Reasoning-based Pixel-level Perception in Medical Images

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing medical image segmentation methods rely heavily on precise bounding boxes or domain-specific textual prompts, exhibiting poor generalizability to natural, non-expert user queries. Method: We propose Medical Semantic Segmentation and Detection (MedSD), a novel task requiring pixel-level segmentation and object localization solely from colloquial, logic-implicit natural language queries. To this end, we formally define the MedSD task; introduce MLMR-SD, a multi-perspective logic-driven dataset enabling annotation-free, box-free, and terminology-free input; and design an end-to-end architecture integrating vision-language understanding, logical reasoning modeling, and multi-granularity localization decoding, augmented by an implicit-reasoning prompt-guided cross-modal alignment mechanism. Contribution/Results: Extensive experiments demonstrate that our method significantly outperforms conventional referring segmentation approaches on MedSD, validating its robust comprehension of lay-user utterances and accurate pixel-level localization—without requiring expert annotations or structured prompts.

Technology Category

Application Category

📝 Abstract

Despite remarkable advancements in pixel-level medical image perception, existing methods are either limited to specific tasks or heavily rely on accurate bounding boxes or text labels as input prompts. However, the medical knowledge required for input is a huge obstacle for general public, which greatly reduces the universality of these methods. Compared with these domain-specialized auxiliary information, general users tend to rely on oral queries that require logical reasoning. In this paper, we introduce a novel medical vision task: Medical Reasoning Segmentation and Detection (MedSD), which aims to comprehend implicit queries about medical images and generate the corresponding segmentation mask and bounding box for the target object. To accomplish this task, we first introduce a Multi-perspective, Logic-driven Medical Reasoning Segmentation and Detection (MLMR-SD) dataset, which encompasses a substantial collection of medical entity targets along with their corresponding reasoning. Furthermore, we propose MediSee, an effective baseline model designed for medical reasoning segmentation and detection. The experimental results indicate that the proposed method can effectively address MedSD with implicit colloquial queries and outperform traditional medical referring segmentation methods.

Problem

Research questions and friction points this paper is trying to address.

Overcoming reliance on specialized medical input for image analysis

Enabling logical reasoning for medical image segmentation and detection

Addressing implicit colloquial queries in medical vision tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-perspective logic-driven medical reasoning dataset

MediSee model for implicit colloquial queries

Segmentation and detection without domain-specialized inputs

🔎 Similar Papers

Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis