🤖 AI Summary
This study addresses three key challenges in physiological signal prediction: the difficulty of generating counterfactual explanations (CFs), their limited clinical feasibility, and data scarcity. We propose a large language model (LLM)-based few-shot prompting framework for CF generation. Unlike conventional methods (e.g., DiCE), our approach employs structured prompting to jointly optimize intervention feasibility and data augmentation utility, enabling end-to-end CF generation for stress and cardiac disease prediction. Using GPT-4o-mini, zero-shot and three-shot prompting yield CFs with 0.99 validity and 99% plausibility. When used as augmented training data, these CFs improve classifier average accuracy by 5%, substantially mitigating performance degradation in low-resource settings. Our core contribution is the first systematic integration of LLM-driven CF generation into clinical physiological modeling—uniquely balancing interpretability, robustness, and clinical practicality.
📝 Abstract
Counterfactual explanations (CFs) offer human-centric insights into machine learning predictions by highlighting minimal changes required to alter an outcome. Therefore, CFs can be used as (i) interventions for abnormality prevention and (ii) augmented data for training robust models. In this work, we explore large language models (LLMs), specifically GPT-4o-mini, for generating CFs in a zero-shot and three-shot setting. We evaluate our approach on two datasets: the AI-Readi flagship dataset for stress prediction and a public dataset for heart disease detection. Compared to traditional methods such as DiCE, CFNOW, and NICE, our few-shot LLM-based approach achieves high plausibility (up to 99%), strong validity (up to 0.99), and competitive sparsity. Moreover, using LLM-generated CFs as augmented samples improves downstream classifier performance (an average accuracy gain of 5%), especially in low-data regimes. This demonstrates the potential of prompt-based generative techniques to enhance explainability and robustness in clinical and physiological prediction tasks. Code base: github.com/anonymous/SenseCF.