Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The Kitagawa–Oaxaca–Blinder (KOB) decomposition fails in nonlinear settings, and mainstream functional decomposition methods—such as functional ANOVA and Accumulated Local Effects (ALE)—exhibit “misattribution”: they erroneously attribute group-level outcome differences to conditional outcome function disparities, even when covariate distributions and conditional outcome functions are identical across groups. Method: We formally define misattribution, derive necessary and sufficient conditions for its occurrence under functional ANOVA, and introduce “input-distribution independence” as a fundamental property ensuring attribution fidelity. Leveraging tools from causal inference, interpretable machine learning, and functional analysis, we conduct a theoretical analysis of additive functional decompositions. Results: We prove that two widely used additive decomposition frameworks inevitably suffer from misattribution under general conditions. We establish a universal, distribution-free criterion for misattribution-free decomposition and demonstrate that any input-distribution-dependent additive decomposition harbors an intrinsic attribution flaw.

Technology Category

Application Category

📝 Abstract
In science and social science, we often wish to explain why an outcome is different in two populations. For instance, if a jobs program benefits members of one city more than another, is that due to differences in program participants (particular covariates) or the local labor markets (outcomes given covariates)? The Kitagawa-Oaxaca-Blinder (KOB) decomposition is a standard tool in econometrics that explains the difference in the mean outcome across two populations. However, the KOB decomposition assumes a linear relationship between covariates and outcomes, while the true relationship may be meaningfully nonlinear. Modern machine learning boasts a variety of nonlinear functional decompositions for the relationship between outcomes and covariates in one population. It seems natural to extend the KOB decomposition using these functional decompositions. We observe that a successful extension should not attribute the differences to covariates -- or, respectively, to outcomes given covariates -- if those are the same in the two populations. Unfortunately, we demonstrate that, even in simple examples, two common decompositions -- functional ANOVA and Accumulated Local Effects -- can attribute differences to outcomes given covariates, even when they are identical in two populations. We provide a characterization of when functional ANOVA misattributes, as well as a general property that any discrete decomposition must satisfy to avoid misattribution. We show that if the decomposition is independent of its input distribution, it does not misattribute. We further conjecture that misattribution arises in any reasonable additive decomposition that depends on the distribution of the covariates.
Problem

Research questions and friction points this paper is trying to address.

Explains outcome differences between populations using decompositions
Identifies misattribution in nonlinear functional decompositions
Proposes properties to avoid decomposition misattribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends KOB decomposition using nonlinear machine learning
Identifies misattribution in functional ANOVA and ALE
Proposes distribution-independent decompositions to avoid misattribution
M
Manuel Quintero
MIT IDSS
W
William T. Stephenson
MIT Lincoln Laboratory
A
Advik Shreekumar
MIT Economics
Tamara Broderick
Tamara Broderick
Associate Professor of EECS, Massachusetts Institute of Technology
Machine LearningStatisticsBayesian Inference