🤖 AI Summary
This study investigates the efficacy and out-of-distribution robustness of ImageNet pretraining for ultra-lightweight convolutional networks (<1M parameters) in infrared target detection. Motivated by edge-deployment constraints, we propose two families of scalable lightweight backbone architectures grounded in scaling laws, and systematically evaluate the impact of pretraining across multiple infrared detection benchmarks. Results demonstrate that while pretraining consistently improves accuracy, its robustness gains exhibit a sharp capacity threshold: below this threshold, cross-domain generalization deteriorates significantly, and pretraining cannot compensate for architectural limitations. This work is the first to reveal a nonlinear relationship between pretraining benefits and model capacity in infrared vision downstream tasks. It provides both theoretical insight and practical guidance for lightweight model design under resource constraints—specifically, cautioning against excessive backbone compression to preserve deployment stability.
📝 Abstract
Many real-world applications require recognition models that are robust to different operational conditions and modalities, but at the same time run on small embedded devices, with limited hardware. While for normal size models, pre-training is known to be very beneficial in accuracy and robustness, for small models, that can be employed for embedded and edge devices, its effect is not clear. In this work, we investigate the effect of ImageNet pretraining on increasingly small backbone architectures (ultra-small models, with $<$1M parameters) with respect to robustness in downstream object detection tasks in the infrared visual modality. Using scaling laws derived from standard object recognition architectures, we construct two ultra-small backbone families and systematically study their performance. Our experiments on three different datasets reveal that while ImageNet pre-training is still useful, beyond a certain capacity threshold, it offers diminishing returns in terms of out-of-distribution detection robustness. Therefore, we advise practitioners to still use pre-training and, when possible avoid too small models as while they might work well for in-domain problems, they are brittle when working conditions are different.