Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study addresses the challenge of balancing predictive performance and clinical interpretability in high-dimensional health data, where traditional interpretable models struggle to capture nonlinearities and interactions, while complex models lack transparency. To bridge this gap, the authors propose an exploratory AI recommendation framework that leverages explainable artificial intelligence (XAI) to guide high-dimensional predictive modeling. The approach employs flexible machine learning to uncover complex patterns and automatically generates three types of literature-grounded modeling suggestions—feature exclusion, inclusion of nonlinear terms, and feature interactions—to enhance both the performance and interpretability of Cox proportional hazards models. Evaluated on a fall-risk prediction task involving 245,614 patients, the method improved the C-index from 0.805 to 0.815 and significantly enhanced calibration. Its generalizability and clinical credibility were further validated on two public datasets.

📝 Abstract

Predictive modelling is important for health data analysis and data-driven clinical decision-making. However, predictive studies are challenging to design optimally by hand when tens or even hundreds of features require selection, transformation, or interaction modelling. While complex machine learning models offer high performance, their "black-box" nature limits the clinical trust, transparency, and interpretability required for decision-making. We developed and evaluated an Exploratory AI Recommender that provides data-driven recommendations to improve predictive performance of existing interpretable statistical models. The developed framework uses flexible AI modelling to capture complex data patterns and explainable AI techniques to translate the patterns into three recommendation types: feature exclusion, non-linear terms, and feature interactions. We evaluated the framework by comparing predictive performance of a baseline (i.e., no interactions or non-linear terms) Cox Proportional Hazards (CPH) model against an augmented CPH incorporating recommendations suggested by our method. The primary analysis predicts the time to the first occurrence of a fall or related injury in 245,614 patients. Our method recommended excluding 23 features, including non-linear terms for two features, and including 221 suggested feature interactions. The C-index improved from 0.805 (95% CI 0.798-0.812) to 0.815 (95% CI 0.809-0.822), and so did calibration (intercept: -0.006 to 0.003; slope: 1.063 to 0.950). All recommendations were supported by existing literature. The method also proved effective on two additional public datasets, demonstrating wider applicability. The proposed Exploratory AI Recommender demonstrates the potential of explainable AI and data-driven study design to improve the process of developing, and the performance of high-dimensional transparent predictive models.

Problem

Research questions and friction points this paper is trying to address.

Explainable AI

Predictive modelling

High-dimensional data

Clinical decision-making

Interpretable models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable AI

Data-Driven Design

Feature Interaction