MedVeriSeg: Teaching MLLM-Based Medical Segmentation Models to Verify Query Validity Without Extra Training

📅 2026-04-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
Current multimodal large language models (MLLMs) struggle to recognize and reject erroneous queries requesting segmentation of non-existent targets in medical images, often generating hallucinated masks that compromise clinical reliability. This work proposes a training-free verification framework that endows LISA-style models with the ability to detect spurious queries for the first time. By analyzing the intensity, compactness, and purity of similarity maps between [SEG] tokens and image features, the method quantitatively assesses target presence. It further integrates GPT-4o for multimodal joint validation of attention heatmaps and computed scores. Evaluated on a small-scale benchmark derived from SA-Med2D-20M, the approach effectively rejects invalid queries while preserving high segmentation accuracy for genuine targets.

Technology Category

Application Category

📝 Abstract
Despite recent advances in MLLM-based medical image segmentation, existing LISA-like methods cannot reliably reject false queries and often produce hallucinated segmentation masks for absent targets. This limitation reduces practical reliability in both medical education and clinical use. In this work, we propose MedVeriSeg, a training-free verification framework that equips LISA-like medical segmentation models with the ability to identify and reject false queries which contain non-existent targets. Our key observation is that the similarity map between the [SEG] token feature and MLLM image features exhibits markedly different distribution patterns for true and false queries. Based on this, we introduce a Similarity Response Quality Scoring Module that characterizes the similarity map from three aspects: strength, compactness, and purity, producing an initial target-existence prediction. We further incorporate qualitative visual evidence by using GPT-4o to jointly assess the similarity heatmap and the results of Similarity Response Quality Scoring Module for final verification. Experiments on a small-scale benchmark constructed from SA-Med2D-20M show that MedVeriSeg effectively rejects false-query segmentation requests while maintaining reliable recognition of true queries.
Problem

Research questions and friction points this paper is trying to address.

medical image segmentation
false query rejection
hallucinated segmentation
query validity verification
MLLM
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free verification
MLLM-based segmentation
false query rejection
similarity response quality scoring
medical image hallucination mitigation
🔎 Similar Papers
No similar papers found.