🤖 AI Summary
Underwater object detection faces severe challenges including low-level feature degradation (e.g., texture, edge, and color distortion), noise interference, and class imbalance due to optical distortions inherent in aquatic environments. Method: This work conducts a systematic robustness evaluation of YOLOv8–v12 across six simulated underwater conditions using the DUO and Roboflow100 datasets (10,000 annotated images), employing cross-model and cross-environment benchmarking. It further proposes a noise-aware sample injection strategy and enhancement-domain fine-tuning to improve generalization under noise perturbations and real underwater domains. Contribution/Results: We identify—for the first time—the robustness bottlenecks of YOLO models underwater: although YOLOv12 achieves the highest overall accuracy, it exhibits extreme sensitivity to noise; detection performance is predominantly constrained by sample quantity and instance frequency. Our lightweight training paradigm and targeted image enhancement significantly boost robustness and domain adaptability, empirically validating their efficacy for underwater detection.
📝 Abstract
Underwater object detection (UOD) remains a critical challenge in computer vision due to underwater distortions which degrade low-level features and compromise the reliability of even state-of-the-art detectors. While YOLO models have become the backbone of real-time object detection, little work has systematically examined their robustness under these uniquely challenging conditions. This raises a critical question: Are YOLO models genuinely robust when operating under the chaotic and unpredictable conditions of underwater environments? In this study, we present one of the first comprehensive evaluations of recent YOLO variants (YOLOv8-YOLOv12) across six simulated underwater environments. Using a unified dataset of 10,000 annotated images from DUO and Roboflow100, we not only benchmark model robustness but also analyze how distortions affect key low-level features such as texture, edges, and color. Our findings show that (1) YOLOv12 delivers the strongest overall performance but is highly vulnerable to noise, and (2) noise disrupts edge and texture features, explaining the poor detection performance in noisy images. Class imbalance is a persistent challenge in UOD. Experiments revealed that (3) image counts and instance frequency primarily drive detection performance, while object appearance exerts only a secondary influence. Finally, we evaluated lightweight training-aware strategies: noise-aware sample injection, which improves robustness in both noisy and real-world conditions, and fine-tuning with advanced enhancement, which boosts accuracy in enhanced domains but slightly lowers performance in original data, demonstrating strong potential for domain adaptation, respectively. Together, these insights provide practical guidance for building resilient and cost-efficient UOD systems.