FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Underwater fish image analysis faces significant challenges due to severe imaging degradation and prohibitively high costs of pixel-level annotation. To address these, we propose FishDetector-R1—the first unified multimodal large language model (MLLM) framework specifically designed for underwater scenarios, enabling weakly supervised detection, segmentation, and counting. Its core innovations are: (1) a *detect-to-count* prompting mechanism that enforces spatial consistency across tasks; and (2) reinforcement learning with verifiable rewards (RLVR), enabling efficient training using only sparse point-level annotations. On the DeepFish benchmark, FishDetector-R1 achieves +20% AP, +10% mIoU, −30% MAE, and −35% GAME over prior methods. Cross-domain generalization experiments further demonstrate strong robustness. This work establishes a new paradigm for underwater visual analysis—achieving high accuracy with minimal annotation effort and enhanced domain adaptability.

Technology Category

Application Category

📝 Abstract

Analyzing underwater fish imagery is critical for ecological monitoring but remains difficult due to visual degradation and costly annotations. We introduce FishDetector-R1, a unified MLLM-based framework for fish detection, segmentation, and counting under weak supervision. On the DeepFish dataset, our framework achieves substantial gains over baselines, improving AP by 20% and mIoU by 10%, while reducing MAE by 30% and GAME by 35%. These improvements stem from two key components: a novel detect-to-count prompt that enforces spatially consistent detections and counts, and Reinforcement Learning from Verifiable Reward (RLVR) with a complementary scalable paradigm leveraging sparse point labels. Ablation studies further validate the effectiveness of this reward design. Moreover, the improvement generalizes well to other underwater datasets, confirming strong cross-domain robustness. Overall, FishDetector-R1 provides a reliable and scalable solution for accurate marine visual understanding via weak supervision. The project page for FishDetector-R1 is https://umfieldrobotics.github.io/FishDetector-R1.

Problem

Research questions and friction points this paper is trying to address.

Detects, segments, and counts fish in underwater images with weak supervision.

Addresses visual degradation and high annotation costs in marine monitoring.

Improves accuracy and robustness for cross-domain underwater ecological analysis.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified MLLM framework for detection, segmentation, and counting

Reinforcement learning with verifiable reward for weak supervision

Detect-to-count prompt ensures spatially consistent outputs

🔎 Similar Papers

Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images