🤖 AI Summary
Underwater fish image analysis faces significant challenges due to severe imaging degradation and prohibitively high costs of pixel-level annotation. To address these, we propose FishDetector-R1—the first unified multimodal large language model (MLLM) framework specifically designed for underwater scenarios, enabling weakly supervised detection, segmentation, and counting. Its core innovations are: (1) a *detect-to-count* prompting mechanism that enforces spatial consistency across tasks; and (2) reinforcement learning with verifiable rewards (RLVR), enabling efficient training using only sparse point-level annotations. On the DeepFish benchmark, FishDetector-R1 achieves +20% AP, +10% mIoU, −30% MAE, and −35% GAME over prior methods. Cross-domain generalization experiments further demonstrate strong robustness. This work establishes a new paradigm for underwater visual analysis—achieving high accuracy with minimal annotation effort and enhanced domain adaptability.
📝 Abstract
Analyzing underwater fish imagery is critical for ecological monitoring but remains difficult due to visual degradation and costly annotations. We introduce FishDetector-R1, a unified MLLM-based framework for fish detection, segmentation, and counting under weak supervision. On the DeepFish dataset, our framework achieves substantial gains over baselines, improving AP by 20% and mIoU by 10%, while reducing MAE by 30% and GAME by 35%. These improvements stem from two key components: a novel detect-to-count prompt that enforces spatially consistent detections and counts, and Reinforcement Learning from Verifiable Reward (RLVR) with a complementary scalable paradigm leveraging sparse point labels. Ablation studies further validate the effectiveness of this reward design. Moreover, the improvement generalizes well to other underwater datasets, confirming strong cross-domain robustness. Overall, FishDetector-R1 provides a reliable and scalable solution for accurate marine visual understanding via weak supervision. The project page for FishDetector-R1 is https://umfieldrobotics.github.io/FishDetector-R1.