🤖 AI Summary
Existing multiclass classification models for 12-lead ECG lack interpretability, while current counterfactual explanations suffer from poor sparsity and insufficient physiological plausibility.
Method: We propose a prototype-driven sparse counterfactual explanation framework that integrates R-peak alignment for enhanced temporal stability, SHAP-based thresholding to identify critical regions, dynamic time warping (DTW) combined with median clustering to extract physiologically informed prototypes, and interval-rule conversion to generate clinically actionable local perturbations.
Results: The method achieves 81.3% overall counterfactual validity—reaching 98.9% for myocardial infarction—improves temporal stability by 43%, and generates explanations in under one second, enabling near-real-time clinical interaction. This work is the first to jointly leverage R-peak alignment and prototype sparsification for ECG counterfactual generation, significantly enhancing both physiological credibility and deployment practicality.
📝 Abstract
In eXplainable Artificial Intelligence (XAI), instance-based explanations for time series have gained increasing attention due to their potential for actionable and interpretable insights in domains such as healthcare. Addressing the challenges of explainability of state-of-the-art models, we propose a prototype-driven framework for generating sparse counterfactual explanations tailored to 12-lead ECG classification models. Our method employs SHAP-based thresholds to identify critical signal segments and convert them into interval rules, uses Dynamic Time Warping (DTW) and medoid clustering to extract representative prototypes, and aligns these prototypes to query R-peaks for coherence with the sample being explained. The framework generates counterfactuals that modify only 78% of the original signal while maintaining 81.3% validity across all classes and achieving 43% improvement in temporal stability. We evaluate three variants of our approach, Original, Sparse, and Aligned Sparse, with class-specific performance ranging from 98.9% validity for myocardial infarction (MI) to challenges with hypertrophy (HYP) detection (13.2%). This approach supports near realtime generation (< 1 second) of clinically valid counterfactuals and provides a foundation for interactive explanation platforms. Our findings establish design principles for physiologically-aware counterfactual explanations in AI-based diagnosis systems and outline pathways toward user-controlled explanation interfaces for clinical deployment.