🤖 AI Summary
Accurately assessing and testing feature-group importance remains challenging for structured economic and business data characterized by small sample sizes and sparse or skewed distributions.
Method: This paper introduces Group Shapley values—extending Shapley attribution from individual features to pre-specified feature groups—and establishes, for the first time, their asymptotic statistical theory. We further propose a chi-square approximation method based on third-order cumulants to enable robust significance testing, overcoming the failure of conventional approaches under small samples.
Results: Theoretical analysis and empirical validation demonstrate that Group Shapley achieves more equitable inter-group importance allocation (confirmed via Lorenz curves and Gini indices) and delivers well-controlled Type-I error rates alongside higher statistical power. Applied to a global dataset of 2,094 bond recovery rates (1996–2023) with 98 features organized into 16 subgroups and 5 broad categories, Group Shapley identifies “market-related variables” as the most critical feature group.
📝 Abstract
We propose Group Shapley, a metric that extends the classical individual-level Shapley value framework to evaluate the importance of feature groups, addressing the structured nature of predictors commonly found in business and economic data. More importantly, we develop a significance testing procedure based on a three-cumulant chi-square approximation and establish the asymptotic properties of the test statistics for Group Shapley values. Our approach can effectively handle challenging scenarios, including sparse or skewed distributions and small sample sizes, outperforming alternative tests such as the Wald test. Simulations confirm that the proposed test maintains robust empirical size and demonstrates enhanced power under diverse conditions. To illustrate the method's practical relevance in advancing Explainable AI, we apply our framework to bond recovery rate predictions using a global dataset (1996-2023) comprising 2,094 observations and 98 features, grouped into 16 subgroups and five broader categories: bond characteristics, firm fundamentals, industry-specific factors, market-related variables, and macroeconomic indicators. Our results identify the market-related variables group as the most influential. Furthermore, Lorenz curves and Gini indices reveal that Group Shapley assigns feature importance more equitably compared to individual Shapley values.