🤖 AI Summary
Early detection of drought stress is critical for minimizing crop losses, yet subtle phenotypic changes necessitate non-invasive aerial imaging and advanced modeling. This paper proposes an interpretable Vision Transformer (ViT)-driven framework tailored for potato crops. We introduce two novel architectures: a ViT-SVM hybrid model and an end-to-end ViT classifier—the first integration of ViT with SVM for agricultural stress recognition. Leveraging transfer learning and multispectral/RGB drone imagery, our method localizes key stress indicators—including leaf wilting and canopy texture degradation—via attention maps. Experimental results demonstrate significant improvements in detection accuracy and provide full interpretability of model decisions through visualized attention mechanisms. The framework enables real-time, trustworthy drought monitoring and management in field conditions. (136 words)
📝 Abstract
Early detection of drought stress is critical for taking timely measures for reducing crop loss before the drought impact becomes irreversible. The subtle phenotypical and physiological changes in response to drought stress are captured by non-invasive imaging techniques and these imaging data serve as valuable resource for machine learning methods to identify drought stress. While convolutional neural networks (CNNs) are in wide use, vision transformers (ViTs) present a promising alternative in capturing long-range dependencies and intricate spatial relationships, thereby enhancing the detection of subtle indicators of drought stress. We propose an explainable deep learning pipeline that leverages the power of ViTs for drought stress detection in potato crops using aerial imagery. We applied two distinct approaches: a synergistic combination of ViT and support vector machine (SVM), where ViT extracts intricate spatial features from aerial images, and SVM classifies the crops as stressed or healthy and an end-to-end approach using a dedicated classification layer within ViT to directly detect drought stress. Our key findings explain the ViT model's decision-making process by visualizing attention maps. These maps highlight the specific spatial features within the aerial images that the ViT model focuses as the drought stress signature. Our findings demonstrate that the proposed methods not only achieve high accuracy in drought stress identification but also shedding light on the diverse subtle plant features associated with drought stress. This offers a robust and interpretable solution for drought stress monitoring for farmers to undertake informed decisions for improved crop management.