🤖 AI Summary
This work addresses the limitations of existing ship detection datasets—particularly their insufficient scale, low proportion of small targets, and limited scene diversity—which hinder robust algorithm evaluation and generalization studies in complex maritime environments. To this end, we introduce WUTDet, a large-scale benchmark dataset comprising 100,576 images with 381,378 annotated instances, spanning diverse scenarios such as ports, anchorages, and underway vessels, and capturing challenging imaging conditions including fog, glare, and low illumination. We further present Ship-GEN, a cross-dataset generalization test set, along with a unified evaluation protocol. For the first time, we systematically evaluate 20 baseline detectors across three mainstream architectures: CNNs, Transformers, and Mamba. Experiments show that Transformers achieve the highest overall accuracy (AP) and small-object performance (APs), CNNs offer the fastest inference, and Mamba strikes a favorable balance between accuracy and efficiency. Models trained on WUTDet demonstrate superior generalization on Ship-GEN.
📝 Abstract
Ship detection for navigation is a fundamental perception task in intelligent waterway transportation systems. However, existing public ship detection datasets remain limited in terms of scale, the proportion of small-object instances, and scene diversity, which hinders the systematic evaluation and generalization study of detection algorithms in complex maritime environments. To this end, we construct WUTDet, a large-scale ship detection dataset. WUTDet contains 100,576 images and 381,378 annotated ship instances, covering diverse operational scenarios such as ports, anchorages, navigation, and berthing, as well as various imaging conditions including fog, glare, low-lightness, and rain, thereby exhibiting substantial diversity and challenge. Based on WUTDet, we systematically evaluate 20 baseline models from three mainstream detection architectures, namely CNN, Transformer, and Mamba. Experimental results show that the Transformer architecture achieves superior overall detection accuracy (AP) and small-object detection performance (APs), demonstrating stronger adaptability to complex maritime scenes; the CNN architecture maintains an advantage in inference efficiency, making it more suitable for real-time applications; and the Mamba architecture achieves a favorable balance between detection accuracy and computational efficiency. Furthermore, we construct a unified cross-dataset test set, Ship-GEN, to evaluate model generalization. Results on Ship-GEN show that models trained on WUTDet exhibit stronger generalization under different data distributions. These findings demonstrate that WUTDet provides effective data support for the research, evaluation, and generalization analysis of ship detection algorithms in complex maritime scenarios. The dataset is publicly available at: https://github.com/MAPGroup/WUTDet.