Hierarchical Variable Importance with Statistical Control for Medical Data-Based Prediction

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Highly correlated features in medical imaging undermine the reliability of model-agnostic interpretability methods—such as conditional importance—thereby limiting clinical applicability. To address this, we propose hierarchical Conditional Permutation Importance (hCPI), a novel framework that constructs a tree-structured feature hierarchy to jointly assess predictive synergy and redistribute importance scores. hCPI is the first model-agnostic method to explicitly control the family-wise error rate via statistical calibration. It seamlessly integrates grouped feature reasoning and rigorous multiple-testing correction, and is compatible with arbitrary black-box models. Evaluated on the ADNI and TDBRAIN neuroimaging datasets, hCPI consistently identifies biologically plausible brain regions associated with Alzheimer’s disease diagnosis and the Berger effect. It significantly enhances interpretability robustness and statistical reliability under high multicollinearity, enabling more trustworthy clinical insights.

Technology Category

Application Category

📝 Abstract

Recent advances in machine learning have greatly expanded the repertoire of predictive methods for medical imaging. However, the interpretability of complex models remains a challenge, which limits their utility in medical applications. Recently, model-agnostic methods have been proposed to measure conditional variable importance and accommodate complex non-linear models. However, they often lack power when dealing with highly correlated data, a common problem in medical imaging. We introduce Hierarchical-CPI, a model-agnostic variable importance measure that frames the inference problem as the discovery of groups of variables that are jointly predictive of the outcome. By exploring subgroups along a hierarchical tree, it remains computationally tractable, yet also enjoys explicit family-wise error rate control. Moreover, we address the issue of vanishing conditional importance under high correlation with a tree-based importance allocation mechanism. We benchmarked Hierarchical-CPI against state-of-the-art variable importance methods. Its effectiveness is demonstrated in two neuroimaging datasets: classifying dementia diagnoses from MRI data (ADNI dataset) and analyzing the Berger effect on EEG data (TDBRAIN dataset), identifying biologically plausible variables.

Problem

Research questions and friction points this paper is trying to address.

Improving interpretability of complex models in medical imaging

Addressing low power in correlated medical imaging data

Enhancing variable importance measurement with error control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical tree-based variable importance measure

Family-wise error rate control mechanism

Tree-based importance allocation for high correlation

🔎 Similar Papers

No similar papers found.