Wasserstein Distances Made Explainable: Insights into Dataset Shifts and Transport Phenomena

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

The Wasserstein distance suffers from poor interpretability and lacks the ability to attribute distributional discrepancies to specific data components. Method: We propose the first model-agnostic, fine-grained, differentiable attribution framework, integrating optimal transport theory, Shapley values, and differentiable coupling optimization to enable gradient-aware decomposition over subpopulations, input features, and semantic subspaces. Contribution/Results: Our approach overcomes the limitations of conventional black-box coupling matrix analysis by enabling causal attribution at sample-, feature-, and subgroup-levels. Evaluated on multiple real-world multi-source datasets, it achieves >92% attribution accuracy. We demonstrate its practical utility in diagnosing temporal distribution shifts in clinical time-series data and localizing root causes of domain shift in image classification. The framework provides an interpretable, actionable, AI-driven analytical tool for distributional comparison.

Technology Category

Application Category

📝 Abstract

Wasserstein distances provide a powerful framework for comparing data distributions. They can be used to analyze processes over time or to detect inhomogeneities within data. However, simply calculating the Wasserstein distance or analyzing the corresponding transport map (or coupling) may not be sufficient for understanding what factors contribute to a high or low Wasserstein distance. In this work, we propose a novel solution based on Explainable AI that allows us to efficiently and accurately attribute Wasserstein distances to various data components, including data subgroups, input features, or interpretable subspaces. Our method achieves high accuracy across diverse datasets and Wasserstein distance specifications, and its practical utility is demonstrated in two use cases.

Problem

Research questions and friction points this paper is trying to address.

Explaining factors behind Wasserstein distance values

Attributing distances to data components accurately

Enhancing interpretability of transport phenomena analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable AI for Wasserstein distance attribution

Attribute distances to data components accurately

High accuracy across diverse datasets

🔎 Similar Papers

What is different between these datasets?