Wasserstein Distances Made Explainable: Insights into Dataset Shifts and Transport Phenomena

πŸ“… 2025-05-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
The Wasserstein distance suffers from poor interpretability and lacks the ability to attribute distributional discrepancies to specific data components. Method: We propose the first model-agnostic, fine-grained, differentiable attribution framework, integrating optimal transport theory, Shapley values, and differentiable coupling optimization to enable gradient-aware decomposition over subpopulations, input features, and semantic subspaces. Contribution/Results: Our approach overcomes the limitations of conventional black-box coupling matrix analysis by enabling causal attribution at sample-, feature-, and subgroup-levels. Evaluated on multiple real-world multi-source datasets, it achieves >92% attribution accuracy. We demonstrate its practical utility in diagnosing temporal distribution shifts in clinical time-series data and localizing root causes of domain shift in image classification. The framework provides an interpretable, actionable, AI-driven analytical tool for distributional comparison.

Technology Category

Application Category

πŸ“ Abstract
Wasserstein distances provide a powerful framework for comparing data distributions. They can be used to analyze processes over time or to detect inhomogeneities within data. However, simply calculating the Wasserstein distance or analyzing the corresponding transport map (or coupling) may not be sufficient for understanding what factors contribute to a high or low Wasserstein distance. In this work, we propose a novel solution based on Explainable AI that allows us to efficiently and accurately attribute Wasserstein distances to various data components, including data subgroups, input features, or interpretable subspaces. Our method achieves high accuracy across diverse datasets and Wasserstein distance specifications, and its practical utility is demonstrated in two use cases.
Problem

Research questions and friction points this paper is trying to address.

Explaining factors behind Wasserstein distance values
Attributing distances to data components accurately
Enhancing interpretability of transport phenomena analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable AI for Wasserstein distance attribution
Attribute distances to data components accurately
High accuracy across diverse datasets
πŸ”Ž Similar Papers
No similar papers found.
Philip Naumann
Philip Naumann
Technische UniversitΓ€t Berlin
Explainable AIMachine LearningOptimal TransportEvolutionary Algorithms
J
Jacob Kauffmann
BIFOLD – Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany, and the Department of Electrical Engineering and Computer Science, Technische Universitaet Berlin, 10587 Berlin, Germany
G
G. Montavon
BIFOLD – Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany, and the Charite – Universitaetsmedizin Berlin, 10117 Berlin, Germany