YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic Images

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the real-time and accuracy bottlenecks in detecting small objects within 4K omnidirectional images—caused by severe geometric distortion, ultra-high resolution, and wide field-of-view—this paper proposes a lightweight and efficient detection framework. Methodologically, it introduces a P2 multi-scale detection head to enhance sensitivity to small objects, adopts GhostConv as a lightweight backbone to significantly reduce parameter count while preserving representational capacity, and integrates multi-scale features—including the P2 layer—into a YOLO-based architecture with an optimized inference pipeline. Furthermore, we establish CVIP360, the first open-source benchmark for 4K omnidirectional object detection, comprising 6,876 frames with precise bounding-box annotations. Experiments demonstrate that our method achieves 0.95 mAP@0.5IoU on 4K panoramic images, with a single-frame inference latency of only 28.3 ms—75% faster and 4.2 percentage points higher in mAP than YOLOv11—thereby delivering both state-of-the-art performance and practical deployability.

Technology Category

Application Category

📝 Abstract
The processing of omnidirectional 360-degree images poses significant challenges for object detection due to inherent spatial distortions, wide fields of view, and ultra-high-resolution inputs. Conventional detectors such as YOLO are optimised for standard image sizes (for example, 640x640 pixels) and often struggle with the computational demands of 4K or higher-resolution imagery typical of 360-degree vision. To address these limitations, we introduce YOLO11-4K, an efficient real-time detection framework tailored for 4K panoramic images. The architecture incorporates a novel multi-scale detection head with a P2 layer to improve sensitivity to small objects often missed at coarser scales, and a GhostConv-based backbone to reduce computational complexity without sacrificing representational power. To enable evaluation, we manually annotated the CVIP360 dataset, generating 6,876 frame-level bounding boxes and producing a publicly available, detection-ready benchmark for 4K panoramic scenes. YOLO11-4K achieves 0.95 mAP at 0.50 IoU with 28.3 milliseconds inference per frame, representing a 75 percent latency reduction compared to YOLO11 (112.3 milliseconds), while also improving accuracy (mAP at 0.50 of 0.95 versus 0.908). This balance of efficiency and precision enables robust object detection in expansive 360-degree environments, making the framework suitable for real-world high-resolution panoramic applications. While this work focuses on 4K omnidirectional images, the approach is broadly applicable to high-resolution detection tasks in autonomous navigation, surveillance, and augmented reality.
Problem

Research questions and friction points this paper is trying to address.

Detects small objects in 4K panoramic images
Reduces computational complexity for real-time processing
Addresses distortions and high resolution in 360-degree vision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale detection head with P2 layer
GhostConv-based backbone for reduced complexity
Real-time detection framework for 4K panoramic images
🔎 Similar Papers
No similar papers found.
H
Huma Hafeez
School of Engineering & Technology, University of New South Wales, Canberra, Australia
M
Matthew Garratt
School of Engineering & Technology, University of New South Wales, Canberra, Australia
Jo Plested
Jo Plested
University of New South Wales
Deep LearningTransfer Learning
S
Sankaran Iyer
School of Computer Science & Engineering, University of New South Wales, Sydney, Australia
Arcot Sowmya
Arcot Sowmya
Professor, University of New South Wales
computer visionimage analysismachine learningmedical image processing