Implicit Bias of the JKO Scheme

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

The Jordan–Kinderlehrer–Otto (JKO) scheme exhibits an implicit bias when approximating Wasserstein gradient flows, particularly at second-order time step size η, whose geometric origin and functional consequences remain poorly understood. Method: Leveraging asymptotic variational analysis on the Wasserstein manifold—viewed as a Riemannian metric space—we characterize this bias through Wasserstein geometric calculus, identifying an equivalent explicit correction to the energy functional. Contribution/Results: We prove that the JKO iteration implicitly minimizes a curvature-corrected functional $J^eta = J + frac{eta}{2}mathcal{R}_J$, where $mathcal{R}_J$ is governed by the Wasserstein Hessian of $J$. This correction induces a systematic deceleration effect, unifying the implicit regularization observed for canonical functionals—including entropy and KL divergence—and explaining emergent structures such as Fisher information and Fisher–Hyvärinen divergence. Based on this insight, we propose the JKO-Flow framework and validate it numerically in the Bures–Wasserstein space and one-dimensional quartic potential sampling, demonstrating that the corrected gradient flow more accurately captures JKO sequence dynamics and significantly improves discrete dynamical modeling fidelity.

Technology Category

Application Category

📝 Abstract

Wasserstein gradient flow provides a general framework for minimizing an energy functional $J$ over the space of probability measures on a Riemannian manifold $(M,g)$. Its canonical time-discretization, the Jordan-Kinderlehrer-Otto (JKO) scheme, produces for any step size $eta>0$ a sequence of probability distributions $ ho_k^eta$ that approximate to first order in $eta$ Wasserstein gradient flow on $J$. But the JKO scheme also has many other remarkable properties not shared by other first order integrators, e.g. it preserves energy dissipation and exhibits unconditional stability for $lambda$-geodesically convex functionals $J$. To better understand the JKO scheme we characterize its implicit bias at second order in $eta$. We show that $ ho_k^eta$ are approximated to order $eta^2$ by Wasserstein gradient flow on a emph{modified} energy [ J^{eta}( ho) = J( ho) - frac{eta}{4}int_M BiglVert abla_g frac{delta J}{delta ho} ( ho) Big Vert_{2}^{2} , ho(dx), ] obtained by subtracting from $J$ the squared metric curvature of $J$ times $eta/4$. The JKO scheme therefore adds at second order in $eta$ a extit{deceleration} in directions where the metric curvature of $J$ is rapidly changing. This corresponds to canonical implicit biases for common functionals: for entropy the implicit bias is the Fisher information, for KL-divergence it is the Fisher-Hyv{""a}rinen divergence, and for Riemannian gradient descent it is the kinetic energy in the metric $g$. To understand the differences between minimizing $J$ and $J^eta$ we study emph{JKO-Flow}, Wasserstein gradient flow on $J^eta$, in several simple numerical examples. These include exactly solvable Langevin dynamics on the Bures-Wasserstein space and Langevin sampling from a quartic potential in 1D.

Problem

Research questions and friction points this paper is trying to address.

Characterizing second-order implicit bias in JKO scheme for Wasserstein gradient flows

Analyzing modified energy functional with curvature correction term

Studying JKO-Flow behavior through numerical examples and comparisons

Innovation

Methods, ideas, or system contributions that make the work stand out.

JKO scheme modifies energy with curvature subtraction

Second order bias adds deceleration in curvature directions

Modified energy incorporates Fisher information for entropy

🔎 Similar Papers

Analyzing Social Biases in Japanese Large Language Models