DDD: Discriminative Difficulty Distance for plant disease diagnosis

📅 2025-01-01

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

In plant disease diagnosis, models trained and tested on in-domain data often exhibit inflated performance estimates—particularly due to fine-grained, ambiguous symptoms and high intra-domain feature variability. To address this, we propose Discriminative Difficulty Distance (DDD), the first metric jointly quantifying inter-domain discrepancy and classification difficulty. DDD leverages low-dimensional representations extracted from a multi-source pretrained encoder (ImageNet21K + multi-crop disease datasets), enabling generalizable difficulty prediction across crops and disease classes. Evaluated on 244,063 images spanning four crops, 34 disease classes, and 27 domains, DDD demonstrates strong negative correlation with independent classifier accuracy (up to *r* = −0.909) and outperforms the ImageNet21K baseline by 0.106–0.485. It effectively exposes data-split bias and informs the development of robust diagnostic models.

Technology Category

Application Category

📝 Abstract

Recent studies on plant disease diagnosis using machine learning (ML) have highlighted concerns about the overestimated diagnostic performance due to inappropriate data partitioning, where training and test datasets are derived from the same source (domain). Plant disease diagnosis presents a challenging classification task, characterized by its fine-grained nature, vague symptoms, and the extensive variability of image features within each domain. In this study, we propose the concept of Discriminative Difficulty Distance (DDD), a novel metric designed to quantify the domain gap between training and test datasets while assessing the classification difficulty of test data. DDD provides a valuable tool for identifying insufficient diversity in training data, thus supporting the development of more diverse and robust datasets. We investigated multiple image encoders trained on different datasets and examined whether the distances between datasets, measured using low-dimensional representations generated by the encoders, are suitable as a DDD metric. The study utilized 244,063 plant disease images spanning four crops and 34 disease classes collected from 27 domains. As a result, we demonstrated that even if the test images are from different crops or diseases than those used to train the encoder, incorporating them allows the construction of a distance measure for a dataset that strongly correlates with the difficulty of diagnosis indicated by the disease classifier developed independently. Compared to the base encoder, pre-trained only on ImageNet21K, the correlation higher by 0.106 to 0.485, reaching a maximum of 0.909.

Problem

Research questions and friction points this paper is trying to address.

Quantify domain gap in datasets

Assess classification difficulty

Enhance dataset diversity and robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discriminative Difficulty Distance metric

Low-dimensional image encoder representations

Cross-domain plant disease image analysis

🔎 Similar Papers

No similar papers found.