π€ AI Summary
This work addresses the theoretical limitations of KullbackβLeibler (KL) divergence in tasks requiring metric properties, as it violates the triangle inequality. Focusing on multivariate Gaussian distributions, the paper establishes the first tight relaxed triangle inequality for KL divergence: under the constraints KL(π©β, π©β) β€ Ξ΅β and KL(π©β, π©β) β€ Ξ΅β, it derives the sharp upper bound of KL(π©β, π©β) together with the conditions under which this bound is attainable. The asymptotic form of this bound under small perturbations is shown to be Ξ΅β + Ξ΅β + β(Ξ΅βΞ΅β) + o(Ξ΅β) + o(Ξ΅β). This result leverages the analytical structure of Gaussian KL divergence through tools from information geometry and optimization theory. The derived bound has been successfully applied to out-of-distribution detection in flow-based generative models and safe reinforcement learning, significantly enhancing both theoretical rigor and empirical performance.
π Abstract
The Kullback-Leibler (KL) divergence is not a proper distance metric and does not satisfy the triangle inequality, posing theoretical challenges in certain practical applications. Existing work has demonstrated that KL divergence between multivariate Gaussian distributions follows a relaxed triangle inequality. Given any three multivariate Gaussian distributions $\mathcal{N}_1, \mathcal{N}_2$, and $\mathcal{N}_3$, if $KL(\mathcal{N}_1, \mathcal{N}_2)\leq \epsilon_1$ and $KL(\mathcal{N}_2, \mathcal{N}_3)\leq \epsilon_2$, then $KL(\mathcal{N}_1, \mathcal{N}_3)<3\epsilon_1+3\epsilon_2+2\sqrt{\epsilon_1\epsilon_2}+o(\epsilon_1)+o(\epsilon_2)$. However, the supremum of $KL(\mathcal{N}_1, \mathcal{N}_3)$ is still unknown. In this paper, we investigate the relaxed triangle inequality for the KL divergence between multivariate Gaussian distributions and give the supremum of $KL(\mathcal{N}_1, \mathcal{N}_3)$ as well as the conditions when the supremum can be attained. When $\epsilon_1$ and $\epsilon_2$ are small, the supremum is $\epsilon_1+\epsilon_2+\sqrt{\epsilon_1\epsilon_2}+o(\epsilon_1)+o(\epsilon_2)$. Finally, we demonstrate several applications of our results in out-of-distribution detection with flow-based generative models and safe reinforcement learning.