🤖 AI Summary
This study addresses the challenge of deploying large-scale vision models on resource-constrained edge devices for plant species and disease identification, a critical bottleneck in biodiversity monitoring and precision agriculture. To overcome this, the authors systematically evaluate the general effectiveness of knowledge distillation across diverse model architectures—including ConvNeXt and Vision Transformer—and training strategies, such as training from scratch versus pretraining initialization. They train and assess 70 models by transferring knowledge from large teachers to lightweight student networks on two benchmarks: Pl@ntNet300K-v2 and Deep-Plant-Disease. The results demonstrate that distilled compact models achieve performance comparable to their larger counterparts while substantially reducing computational costs, thereby offering a robust and efficient solution for edge deployment in real-world agricultural and ecological applications.
📝 Abstract
Recent advances in large-scale visual representation learning have significantly improved performance in plant species and plant disease recognition tasks. However, state-of-the-art models, often based on high-capacity vision transformers or multimodal foundation models, remain computationally expensive and difficult to deploy in resource-constrained environments such as mobile or edge devices. This limitation hinders the scalability of automated biodiversity monitoring and precision agriculture systems, where efficiency is as critical as accuracy. In this work, we investigate knowledge distillation as an effective approach to transfer the representational capacity of large pretrained models into smaller, more efficient architectures. We focus on plant species and disease recognition, and conduct an extensive empirical study on two challenging benchmarks: Pl@ntNet300K-v2 and Deep-Plant-Disease. We evaluate four representative architectures, including two ConvNeXt models and two vision transformers, under multiple training regimes: from-scratch training and pretrained initialization, each with and without distillation. In total, we train and evaluate 70 models. Our results show that knowledge distillation consistently improves performance across tasks and architectures. Distilled models are able to match the performance of significantly larger models while maintaining substantially lower computational cost. These findings demonstrate the potential of knowledge distillation techniques to enable efficient and scalable deployment of plant recognition systems in real-world environmental applications.