🤖 AI Summary
To address the challenge of accurately and securely computing cross-domain Wasserstein distances without sharing raw data in privacy-sensitive settings, this paper proposes TriangleWad: the first protocol leveraging the intrinsic triangle inequality of Wasserstein space to enable high-accuracy, attack-resilient distance estimation with zero raw-data leakage. Our method integrates optimal transport theory, differential privacy boundary analysis, and distributed triangular constraints into a collaborative computation framework. Extensive experiments on image and text multi-task benchmarks demonstrate that TriangleWad reduces estimation error by 37%–52% compared to state-of-the-art federated and privacy-preserving approaches. Rigorous privacy auditing confirms strong resistance against both data reconstruction and membership inference attacks. The core contribution lies in incorporating geometric priors—specifically, the triangle inequality—into private distance computation, thereby breaking the traditional accuracy–privacy trade-off bottleneck.
📝 Abstract
Wasserstein distance is a key metric for quantifying data divergence from a distributional perspective. However, its application in privacy-sensitive environments, where direct sharing of raw data is prohibited, presents significant challenges. Existing approaches, such as Differential Privacy and Federated Optimization, have been employed to estimate the Wasserstein distance under such constraints. However, these methods often fall short when both accuracy and security are required. In this study, we explore the inherent triangular properties within the Wasserstein space, leading to a novel solution named TriangleWad. This approach facilitates the fast computation of the Wasserstein distance between datasets stored across different entities, ensuring that raw data remain completely hidden. TriangleWad not only strengthens resistance to potential attacks but also preserves high estimation accuracy. Through extensive experiments across various tasks involving both image and text data, we demonstrate its superior performance and significant potential for real-world applications.