YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection

📅 2025-12-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the computational resource misallocation in real-time object detection (RTOD)—where static, dense computation leads to redundancy in simple scenes and insufficiency in complex ones—this paper proposes an instance-conditioned adaptive computation framework. Our method introduces: (1) a lightweight dynamic routing network that governs a sparse Mixture-of-Experts (ES-MoE) module, enabling expert specialization and complementary modeling; (2) a scene-complexity-aware expert activation mechanism that jointly optimizes accuracy and inference efficiency; and (3) a compact architecture integrating a YOLO-style backbone with a dedicated Transformer. Evaluated on MS COCO, our approach achieves 42.4% AP at 1.62 ms latency—improving mAP by 0.8% and accelerating inference by 17.8% over YOLOv13-N—while significantly enhancing robustness in dense-scene scenarios without compromising real-time performance in typical settings.

Technology Category

Application Category

📝 Abstract
Existing Real-Time Object Detection (RTOD) methods commonly adopt YOLO-like architectures for their favorable trade-off between accuracy and speed. However, these models rely on static dense computation that applies uniform processing to all inputs, misallocating representational capacity and computational resources such as over-allocating on trivial scenes while under-serving complex ones. This mismatch results in both computational redundancy and suboptimal detection performance. To overcome this limitation, we propose YOLO-Master, a novel YOLO-like framework that introduces instance-conditional adaptive computation for RTOD. This is achieved through a Efficient Sparse Mixture-of-Experts (ES-MoE) block that dynamically allocates computational resources to each input according to its scene complexity. At its core, a lightweight dynamic routing network guides expert specialization during training through a diversity enhancing objective, encouraging complementary expertise among experts. Additionally, the routing network adaptively learns to activate only the most relevant experts, thereby improving detection performance while minimizing computational overhead during inference. Comprehensive experiments on five large-scale benchmarks demonstrate the superiority of YOLO-Master. On MS COCO, our model achieves 42.4% AP with 1.62ms latency, outperforming YOLOv13-N by +0.8% mAP and 17.8% faster inference. Notably, the gains are most pronounced on challenging dense scenes, while the model preserves efficiency on typical inputs and maintains real-time inference speed. Code will be available.
Problem

Research questions and friction points this paper is trying to address.

Dynamic allocation of computational resources based on scene complexity
Reducing computational redundancy in real-time object detection
Improving detection performance in dense scenes while maintaining speed
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses ES-MoE blocks for dynamic resource allocation per input.
Employs lightweight routing network to activate relevant experts.
Enhances detection in dense scenes while maintaining real-time speed.
🔎 Similar Papers
No similar papers found.