You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of severe color distortion, low resolution in underwater images, and limited computational resources on edge devices—which hinder the simultaneous achievement of high detection accuracy and speed—this paper proposes an ultra-lightweight real-time object detection framework. Methodologically, it introduces a multi-spectral wavelet encoder to suppress frequency-domain color bias, employs dynamically adaptive transpose convolution to enhance feature resampling generalizability, and integrates channel pruning with a restructured large-kernel convolution (RLKC) for extreme model compression. The resulting model contains only 1.2M parameters and achieves mAP50 scores of 83.1% and 82.9% on the URPC2020 and DUO benchmarks, respectively. It attains 781.3 FPS on an NVIDIA T4 GPU and 57.8 FPS on a Jetson Xavier NX, outperforming YOLOv12-N by 22.5–28.1% in inference speed while significantly surpassing existing lightweight detectors in both accuracy and efficiency.

Technology Category

Application Category

📝 Abstract
Despite the remarkable achievements in object detection, the model's accuracy and efficiency still require further improvement under challenging underwater conditions, such as low image quality and limited computational resources. To address this, we propose an Ultra-Light Real-Time Underwater Object Detection framework, You Sense Only Once Beneath (YSOOB). Specifically, we utilize a Multi-Spectrum Wavelet Encoder (MSWE) to perform frequency-domain encoding on the input image, minimizing the semantic loss caused by underwater optical color distortion. Furthermore, we revisit the unique characteristics of even-sized and transposed convolutions, allowing the model to dynamically select and enhance key information during the resampling process, thereby improving its generalization ability. Finally, we eliminate model redundancy through a simple yet effective channel compression and reconstructed large kernel convolution (RLKC) to achieve model lightweight. As a result, forms a high-performance underwater object detector YSOOB with only 1.2 million parameters. Extensive experimental results demonstrate that, with the fewest parameters, YSOOB achieves mAP50 of 83.1% and 82.9% on the URPC2020 and DUO datasets, respectively, comparable to the current SOTA detectors. The inference speed reaches 781.3 FPS and 57.8 FPS on the T4 GPU (TensorRT FP16) and the edge computing device Jetson Xavier NX (TensorRT FP16), surpassing YOLOv12-N by 28.1% and 22.5%, respectively.
Problem

Research questions and friction points this paper is trying to address.

Improves accuracy in low-quality underwater images
Reduces computational load for real-time detection
Minimizes model size while maintaining performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Spectrum Wavelet Encoder for frequency-domain encoding
Dynamic selection with even-sized transposed convolutions
Channel compression with reconstructed large kernel convolution
🔎 Similar Papers
No similar papers found.
Jun Dong
Jun Dong
South China Normal University
计算机视觉
W
Wenli Wu
School of Data Science and Engineering, and Xingzhi College, South China Normal University, Shanwei
J
Jintao Cheng
School of Physics, South China Normal University, Guangzhou
X
Xiaoyu Tang
School of Electronics and Information Engineering, and Xingzhi College, South China Normal University, Shanwei