🤖 AI Summary
This work addresses the challenge of achieving both high accuracy and computational efficiency for real-time scene detection on resource-constrained mobile devices. To this end, we propose a novel three-stage cyclic training framework that integrates exploration and stabilization mechanisms. We pioneer the incorporation of semi-supervised domain adaptation (SSDA) into mobile-device training, enabling synergistic exploitation of knowledge from large pre-trained models and unlabeled target-domain data. Coupled with a lightweight network architecture and CPU-optimized inference, our approach achieves 94.00% Top-1 and 99.17% Top-3 accuracy on the CamSSD dataset, with only 1.61 ms latency per frame on CPU—satisfying stringent on-device real-time requirements. Our core contributions are: (i) the first SSDA paradigm explicitly designed for mobile deployment; (ii) a scalable, cyclic training framework; and (iii) an end-to-end lightweight solution delivering state-of-the-art accuracy–latency trade-offs.
📝 Abstract
Nowadays, smartphones are ubiquitous, and almost everyone owns one. At the same time, the rapid development of AI has spurred extensive research on applying deep learning techniques to image classification. However, due to the limited resources available on mobile devices, significant challenges remain in balancing accuracy with computational efficiency. In this paper, we propose a novel training framework called Cycle Training, which adopts a three-stage training process that alternates between exploration and stabilization phases to optimize model performance. Additionally, we incorporate Semi-Supervised Domain Adaptation (SSDA) to leverage the power of large models and unlabeled data, thereby effectively expanding the training dataset. Comprehensive experiments on the CamSSD dataset for mobile scene detection demonstrate that our framework not only significantly improves classification accuracy but also ensures real-time inference efficiency. Specifically, our method achieves a 94.00% in Top-1 accuracy and a 99.17% in Top-3 accuracy and runs inference in just 1.61ms using CPU, demonstrating its suitability for real-world mobile deployment.