🤖 AI Summary
Underwater fish detection (UFD) faces significant challenges, including low target resolution, strong background clutter, and high visual similarity between fish and their aquatic environments. Existing approaches often rely on complex attention mechanisms or localized enhancement modules, leading to model redundancy and reduced inference efficiency. To address these issues, we propose EPA-Net, an Efficient Path Aggregation Network. Its key contributions are: (1) long-range cross-scale skip connections to enhance semantic–spatial feature complementarity; (2) a Multi-Scale Diverse Dual-Path Short Bottleneck (MS-DDSP) module to enrich local feature diversity; and (3) a lightweight Efficient Path Aggregation Feature Pyramid Network (EPA-FPN) enabling robust cross-layer fusion and fine-grained feature partitioning. Extensive experiments on multiple benchmark UFD datasets demonstrate that EPA-Net achieves superior detection accuracy and faster inference speed compared to state-of-the-art methods, while maintaining comparable or lower parameter counts—effectively balancing precision and practical deployability.
📝 Abstract
Underwater fish detection (UFD) remains a challenging task in computer vision due to low object resolution, significant background interference, and high visual similarity between targets and surroundings. Existing approaches primarily focus on local feature enhancement or incorporate complex attention mechanisms to highlight small objects, often at the cost of increased model complexity and reduced efficiency. To address these limitations, we propose an efficient path aggregation network (EPANet), which leverages complementary feature integration to achieve accurate and lightweight UFD. EPANet consists of two key components: an efficient path aggregation feature pyramid network (EPA-FPN) and a multi-scale diverse-division short path bottleneck (MS-DDSP bottleneck). The EPA-FPN introduces long-range skip connections across disparate scales to improve semantic-spatial complementarity, while cross-layer fusion paths are adopted to enhance feature integration efficiency. The MS-DDSP bottleneck extends the conventional bottleneck structure by introducing finer-grained feature division and diverse convolutional operations, thereby increasing local feature diversity and representation capacity. Extensive experiments on benchmark UFD datasets demonstrate that EPANet outperforms state-of-the-art methods in terms of detection accuracy and inference speed, while maintaining comparable or even lower parameter complexity.