🤖 AI Summary
Existing medical image segmentation methods are predominantly single-task and single-turn, limiting their ability to support clinically relevant, multi-step reasoning for complex entity segmentation. To address this, we propose a novel task—multi-turn entity-level medical reasoning segmentation—and introduce MR-MedSeg, the first large-scale multi-turn medical segmentation dialogue dataset comprising 177,000 dialogues. We design a lightweight Judgment & Correction mechanism to enable cross-turn entity state tracking, dynamic assessment, and error correction. Our method, built upon a text-prompting framework, jointly models dialogue history and updates entity states, effectively mitigating error accumulation. Evaluated on MR-MedSeg, our approach consistently outperforms single-turn referring-expression segmentation baselines. Results demonstrate that multi-turn entity-level reasoning significantly enhances segmentation accuracy and clinical robustness, underscoring its critical role in real-world medical applications.
📝 Abstract
Despite the progress in medical image segmentation, most existing methods remain task-specific and lack interactivity. Although recent text-prompt-based segmentation approaches enhance user-driven and reasoning-based segmentation, they remain confined to single-round dialogues and fail to perform multi-round reasoning. In this work, we introduce Multi-Round Entity-Level Medical Reasoning Segmentation (MEMR-Seg), a new task that requires generating segmentation masks through multi-round queries with entity-level reasoning. To support this task, we construct MR-MedSeg, a large-scale dataset of 177K multi-round medical segmentation dialogues, featuring entity-based reasoning across rounds. Furthermore, we propose MediRound, an effective baseline model designed for multi-round medical reasoning segmentation. To mitigate the inherent error propagation in the chain-like pipeline of multi-round segmentation, we introduce a lightweight yet effective Judgment & Correction Mechanism during model inference. Experimental results demonstrate that our method effectively addresses the MEMR-Seg task and outperforms conventional medical referring segmentation methods.