Small Object Detection in Complex Backgrounds with Multi-Scale Attention and Global Relation Modeling

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the challenges of small object detection in complex backgrounds, where performance is hindered by feature degradation, weak semantic cues, and inaccurate localization. To overcome these limitations, we propose a multi-level feature enhancement framework with global relation modeling. Specifically, residual Haar wavelet downsampling is employed to preserve structural details, while a global relation module suppresses background interference. A cross-scale hybrid attention mechanism enables efficient multi-scale feature fusion, and a center-assisted loss function is introduced to refine localization accuracy. Evaluated on the RGBT-Tiny benchmark, the proposed method significantly outperforms existing state-of-the-art approaches, achieving superior performance in both IoU-based metrics and scale-adaptive evaluation criteria.

Technology Category

Application Category

📝 Abstract

Small object detection under complex backgrounds remains a challenging task due to severe feature degradation, weak semantic representation, and inaccurate localization caused by downsampling operations and background interference. Existing detection frameworks are mainly designed for general objects and often fail to explicitly address the unique characteristics of small objects, such as limited structural cues and strong sensitivity to localization errors. In this paper, we propose a multi-level feature enhancement and global relation modeling framework tailored for small object detection. Specifically, a Residual Haar Wavelet Downsampling module is introduced to preserve fine-grained structural details by jointly exploiting spatial-domain convolutional features and frequency-domain representations. To enhance global semantic awareness and suppress background noise, a Global Relation Modeling module is employed to capture long-range dependencies at high-level feature stages. Furthermore, a Cross-Scale Hybrid Attention module is designed to establish sparse and aligned interactions across multi-scale features, enabling effective fusion of high-resolution details and high-level semantic information with reduced computational overhead. Finally, a Center-Assisted Loss is incorporated to stabilize training and improve localization accuracy for small objects. Extensive experiments conducted on the large-scale RGBT-Tiny benchmark demonstrate that the proposed method consistently outperforms existing state-of-the-art detectors under both IoU-based and scale-adaptive evaluation metrics. These results validate the effectiveness and robustness of the proposed framework for small object detection in complex environments.

Problem

Research questions and friction points this paper is trying to address.

Small Object Detection

Complex Backgrounds

Feature Degradation

Localization Accuracy

Semantic Representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Scale Attention

Global Relation Modeling

Haar Wavelet Downsampling