🤖 AI Summary
TinyML faces a systemic bottleneck in deploying high-accuracy person detection on ultra-low-power devices due to the scarcity of large-scale, high-quality training data. To address this, we introduce Wake Vision—the first TinyML-specific dataset for person detection, comprising over 6 million images and pioneering a dual-version data strategy: “Large” (scale-optimized) and “Quality” (accuracy-optimized). We curate a human-verified validation set, reducing mislabeling rates from 7.8% to 2.2%. Additionally, we establish a robustness benchmark covering five realistic scenarios—varying illumination, distance, and demographic diversity. Leveraging data quality filtering, knowledge distillation–based pretraining, and a TinyML-adapted evaluation framework, our approach improves detection accuracy by 1.93% and reduces validation error by 5.6 percentage points on representative models. All data, code, and models are publicly released under the CC-BY 4.0 license.
📝 Abstract
Tiny machine learning (TinyML) for low-power devices lacks robust datasets for development. We present Wake Vision, a large-scale dataset for person detection that contains over 6 million quality-filtered images. We provide two variants: Wake Vision (Large) and Wake Vision (Quality), leveraging the large variant for pretraining and knowledge distillation, while the higher-quality labels drive final model performance. The manually labeled validation and test sets reduce error rates from 7.8% to 2.2% compared to previous standards. In addition, we introduce five detailed benchmark sets to evaluate model performance in real-world scenarios, including varying lighting, camera distances, and demographic characteristics. Training with Wake Vision improves accuracy by 1.93% over existing datasets, demonstrating the importance of dataset quality for low-capacity models and dataset size for high-capacity models. The dataset, benchmarks, code, and models are available under the CC-BY 4.0 license, maintained by the Edge AI Foundation.