🤖 AI Summary
This work addresses a critical limitation of existing explainable AI (XAI) metrics in evaluating poultry disease sound detection models: their reliance on single-model fidelity, which fails to identify spurious acoustic cues—such as fixed environmental noise—that arise from dependencies across multiple models, thereby yielding unreliable explanations. To overcome this, the authors propose AGRI-Fidelity, a novel reliability-oriented evaluation framework that establishes a null distribution through cross-model consensus and cyclic temporal permutation. Leveraging statistical inference based on the false discovery rate (FDR), the method effectively suppresses static artifacts without requiring spatial annotations while preserving temporally localized bioacoustic markers. Experiments on both real-world and controlled datasets demonstrate that AGRI-Fidelity significantly outperforms conventional mask-based metrics and provides sample-wise, reliability-aware discriminative capability.
📝 Abstract
Existing XAI metrics measure faithfulness for a single model, ignoring model multiplicity where near-optimal classifiers rely on different or spurious acoustic cues. In noisy farm environments, stationary artifacts such as ventilation noise can produce explanations that are faithful yet unreliable, as masking-based metrics fail to penalize redundant shortcuts. We propose AGRI-Fidelity, a reliability-oriented evaluation framework for listenable explanations in poultry disease detection without spatial ground truth. The method combines cross-model consensus with cyclic temporal permutation to construct null distributions and compute a False Discovery Rate (FDR), suppressing stationary artifacts while preserving time-localized bioacoustic markers. Across real and controlled datasets, AGRI-Fidelity effectively provides reliability-aware discrimination for all data points versus masking-based metrics.