Adaptive Detector-Verifier Framework for Zero-Shot Polyp Detection in Open-World Settings

📅 2025-12-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In open-world colonoscopy, zero-shot polyp detection suffers severe performance degradation due to illumination variations, motion blur, and occlusions. To address this, we propose a two-stage detector–verifier adaptive framework. Our method introduces a vision-language model (VLM)-guided frame-level dynamic confidence thresholding mechanism and a clinically motivated asymmetric cost-sensitive reward function—designed to penalize false negatives more heavily—to guide verifier refinement. We further employ Grouped Relative Policy Optimization (GRPO), a reinforcement learning algorithm, for verifier fine-tuning. Evaluated on YOLOv11 with a synthetic degradation benchmark, our approach achieves 14–22 percentage-point improvements in recall under zero-shot degradation settings on CVC-ClinicDB and Kvasir-SEG, while maintaining precision fluctuations within ±1.7%. This significantly mitigates false-negative risk in clinical deployment.

Technology Category

Application Category

📝 Abstract
Polyp detectors trained on clean datasets often underperform in real-world endoscopy, where illumination changes, motion blur, and occlusions degrade image quality. Existing approaches struggle with the domain gap between controlled laboratory conditions and clinical practice, where adverse imaging conditions are prevalent. In this work, we propose AdaptiveDetector, a novel two-stage detector-verifier framework comprising a YOLOv11 detector with a vision-language model (VLM) verifier. The detector adaptively adjusts per-frame confidence thresholds under VLM guidance, while the verifier is fine-tuned with Group Relative Policy Optimization (GRPO) using an asymmetric, cost-sensitive reward function specifically designed to discourage missed detections -- a critical clinical requirement. To enable realistic assessment under challenging conditions, we construct a comprehensive synthetic testbed by systematically degrading clean datasets with adverse conditions commonly encountered in clinical practice, providing a rigorous benchmark for zero-shot evaluation. Extensive zero-shot evaluation on synthetically degraded CVC-ClinicDB and Kvasir-SEG images demonstrates that our approach improves recall by 14 to 22 percentage points over YOLO alone, while precision remains within 0.7 points below to 1.7 points above the baseline. This combination of adaptive thresholding and cost-sensitive reinforcement learning achieves clinically aligned, open-world polyp detection with substantially fewer false negatives, thereby reducing the risk of missed precancerous polyps and improving patient outcomes.
Problem

Research questions and friction points this paper is trying to address.

Polyp detectors fail in real-world endoscopy due to image degradation
Domain gap exists between clean lab data and adverse clinical conditions
Missed detections of precancerous polyps pose critical clinical risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive thresholding guided by vision-language model for detection
Cost-sensitive reinforcement learning fine-tunes verifier with GRPO
Synthetic testbed simulates adverse clinical conditions for evaluation
🔎 Similar Papers
No similar papers found.
S
Shengkai Xu
College of Computer Science, Sichuan University, Chengdu, China
H
Hsiang Lun Kao
Columbia University, New York, NY , USA
Tianxiang Xu
Tianxiang Xu
Peking University
H
Honghui Zhang
College of Computer Science, Sichuan University, Chengdu, China
J
Junqiao Wang
College of Computer Science, Sichuan University, Chengdu, China
R
Runmeng Ding
Apon AI and Brain-Computer Engineering Research Institute, Shanghai, China
G
Guanyu Liu
Faculty of Science and Technology, University of Macau, Macao, China
Tianyu Shi
Tianyu Shi
University of Toronto
Reinforcement learningIntelligent Transportation SystemLarge Language ModelsAILLM agent
Z
Zhenyu Yu
Faculty of Computer Science and Information Technology, University of Malaya, Malaysia
G
Guofeng Pan
Zhaolong Technology, Shenzhen, China
Z
Ziqian Bi
Purdue University, West Lafayette, IN, USA
Yuqi Ouyang
Yuqi Ouyang
Sichuan University
Computer Vision