🤖 AI Summary
Maritime object detection faces two key challenges: scarcity of annotated data and poor generalization across diverse scenarios—particularly underrepresented ones such as open-sea environments. To address these, we propose a generation-selection collaborative synthetic data augmentation framework. First, we design an X-to-Maritime multimodal diffusion model incorporating bidirectional object-water attention to synthesize diverse, high-fidelity maritime scene images. Second, we introduce an attribute-correlated active sampling strategy that enables task-aware selection of high-quality synthetic samples. The framework end-to-end integrates multimodal conditional generation, attention-based modeling, and active learning. Evaluated on data-scarce open-sea scenarios, our method achieves significant improvements in detection accuracy—up to 12.7% mAP gain over baselines—and establishes a new benchmark for generative vision learning in maritime domains.
📝 Abstract
Maritime object detection is essential for navigation safety, surveillance, and autonomous operations, yet constrained by two key challenges: the scarcity of annotated maritime data and poor generalization across various maritime attributes (e.g., object category, viewpoint, location, and imaging environment). % In particular, models trained on existing datasets often underperform in underrepresented scenarios such as open-sea environments. To address these challenges, we propose Neptune-X, a data-centric generative-selection framework that enhances training effectiveness by leveraging synthetic data generation with task-aware sample selection. From the generation perspective, we develop X-to-Maritime, a multi-modality-conditioned generative model that synthesizes diverse and realistic maritime scenes. A key component is the Bidirectional Object-Water Attention module, which captures boundary interactions between objects and their aquatic surroundings to improve visual fidelity. To further improve downstream tasking performance, we propose Attribute-correlated Active Sampling, which dynamically selects synthetic samples based on their task relevance. To support robust benchmarking, we construct the Maritime Generation Dataset, the first dataset tailored for generative maritime learning, encompassing a wide range of semantic conditions. Extensive experiments demonstrate that our approach sets a new benchmark in maritime scene synthesis, significantly improving detection accuracy, particularly in challenging and previously underrepresented settings.The code is available at https://github.com/gy65896/Neptune-X.