Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian Distributions

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the theoretical limitations of Kullback–Leibler (KL) divergence in tasks requiring metric properties, as it violates the triangle inequality. Focusing on multivariate Gaussian distributions, the paper establishes the first tight relaxed triangle inequality for KL divergence: under the constraints KL(𝒩₁, 𝒩₂) ≤ ε₁ and KL(𝒩₂, 𝒩₃) ≤ ε₂, it derives the sharp upper bound of KL(𝒩₁, 𝒩₃) together with the conditions under which this bound is attainable. The asymptotic form of this bound under small perturbations is shown to be ε₁ + ε₂ + √(ε₁ε₂) + o(ε₁) + o(ε₂). This result leverages the analytical structure of Gaussian KL divergence through tools from information geometry and optimization theory. The derived bound has been successfully applied to out-of-distribution detection in flow-based generative models and safe reinforcement learning, significantly enhancing both theoretical rigor and empirical performance.

Technology Category

Application Category

📝 Abstract

The Kullback-Leibler (KL) divergence is not a proper distance metric and does not satisfy the triangle inequality, posing theoretical challenges in certain practical applications. Existing work has demonstrated that KL divergence between multivariate Gaussian distributions follows a relaxed triangle inequality. Given any three multivariate Gaussian distributions $\mathcal{N}_1, \mathcal{N}_2$, and $\mathcal{N}_3$, if $KL(\mathcal{N}_1, \mathcal{N}_2)\leq \epsilon_1$ and $KL(\mathcal{N}_2, \mathcal{N}_3)\leq \epsilon_2$, then $KL(\mathcal{N}_1, \mathcal{N}_3)<3\epsilon_1+3\epsilon_2+2\sqrt{\epsilon_1\epsilon_2}+o(\epsilon_1)+o(\epsilon_2)$. However, the supremum of $KL(\mathcal{N}_1, \mathcal{N}_3)$ is still unknown. In this paper, we investigate the relaxed triangle inequality for the KL divergence between multivariate Gaussian distributions and give the supremum of $KL(\mathcal{N}_1, \mathcal{N}_3)$ as well as the conditions when the supremum can be attained. When $\epsilon_1$ and $\epsilon_2$ are small, the supremum is $\epsilon_1+\epsilon_2+\sqrt{\epsilon_1\epsilon_2}+o(\epsilon_1)+o(\epsilon_2)$. Finally, we demonstrate several applications of our results in out-of-distribution detection with flow-based generative models and safe reinforcement learning.

Problem

Research questions and friction points this paper is trying to address.

Kullback-Leibler divergence

triangle inequality

multivariate Gaussian distributions

supremum

relaxed triangle inequality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kullback-Leibler divergence

relaxed triangle inequality

multivariate Gaussian distributions