🤖 AI Summary
This paper systematically reviews the decade-long evolution of the YOLO series (v1–v10), addressing the challenge of characterizing technical advances and application expansion in real-time object detection. We propose a novel cross-version comparability framework, integrating architectural analysis, complexity-accuracy trade-off evaluation, cross-domain case studies, and standardized benchmarking. Our analysis identifies five key evolutionary patterns: deepening modular design, persistent lightweighting, emerging multimodal fusion, growing deployment-oriented ethical awareness, and convergence toward unified training paradigms. Furthermore, we articulate three future research directions: joint optimization of efficiency and robustness, embodiment-aware adaptation for robotic and interactive systems, and trustworthy AI deployment frameworks. The work delivers an authoritative evolutionary roadmap and an open-source, reproducible evaluation benchmark—bridging rigorous academic insight with industrial applicability.
📝 Abstract
This review marks the tenth anniversary of You Only Look Once (YOLO), one of the most influential frameworks in real-time object detection. Over the past decade, YOLO has evolved from a streamlined detector into a diverse family of architectures characterized by efficient design, modular scalability, and cross-domain adaptability. The paper presents a technical overview of the main versions, highlights key architectural trends, and surveys the principal application areas in which YOLO has been adopted. It also addresses evaluation practices, ethical considerations, and potential future directions for the framework's continued development. The analysis aims to provide a comprehensive and critical perspective on YOLO's trajectory and ongoing transformation.