π€ AI Summary
Underwater visual recognition faces severe challenges due to degradation factors including turbidity, low illumination, and occlusion. To address this, we introduce AQUA20βthe first fine-grained, 20-class benchmark dataset for complex underwater scenes, comprising 8,171 real-world imagesβand the first systematic integration of multi-dimensional underwater degradation modeling to support robust recognition research. We propose a cross-architecture evaluation framework jointly assessing model performance and interpretability, incorporating 13 state-of-the-art models (e.g., ConvNeXt, ViT) with attribution analysis via Grad-CAM and LIME. ConvNeXt achieves top-1 accuracy of 90.69%, top-3 accuracy of 98.82%, and F1-score of 88.92% on AQUA20, revealing critical generalization bottlenecks in real-world settings. The dataset is publicly released to advance standardized evaluation of underwater vision algorithms.
π Abstract
Robust visual recognition in underwater environments remains a significant challenge due to complex distortions such as turbidity, low illumination, and occlusion, which severely degrade the performance of standard vision systems. This paper introduces AQUA20, a comprehensive benchmark dataset comprising 8,171 underwater images across 20 marine species reflecting real-world environmental challenges such as illumination, turbidity, occlusions, etc., providing a valuable resource for underwater visual understanding. Thirteen state-of-the-art deep learning models, including lightweight CNNs (SqueezeNet, MobileNetV2) and transformer-based architectures (ViT, ConvNeXt), were evaluated to benchmark their performance in classifying marine species under challenging conditions. Our experimental results show ConvNeXt achieving the best performance, with a Top-3 accuracy of 98.82% and a Top-1 accuracy of 90.69%, as well as the highest overall F1-score of 88.92% with moderately large parameter size. The results obtained from our other benchmark models also demonstrate trade-offs between complexity and performance. We also provide an extensive explainability analysis using GRAD-CAM and LIME for interpreting the strengths and pitfalls of the models. Our results reveal substantial room for improvement in underwater species recognition and demonstrate the value of AQUA20 as a foundation for future research in this domain. The dataset is publicly available at: https://huggingface.co/datasets/taufiktrf/AQUA20.