Scalable Bi-causal Optimal Transport via KL Relaxation and Policy Gradients

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the scalability challenges of bi-causal optimal transport (OT) for continuous distributions and long-horizon stochastic processes under unforeseen information constraints. By introducing a Kullback–Leibler (KL) divergence penalty to relax hard marginal constraints, the authors develop a scalable stochastic optimization framework that preserves the recursive structure of the problem. Key contributions include establishing a dynamic programming principle under KL relaxation, proving convergence of the relaxed solution to the original problem as the penalty intensifies, and providing—for the first time—an explicit policy gradient expression along with a non-asymptotic regret bound for the objective function. Leveraging unbiased minibatch estimation and variance reduction techniques, the proposed method accurately recovers marginal distributions and temporal dependencies in applications such as robust sub-hedging and statistical downscaling of time series, significantly broadening the practical applicability of bi-causal OT.

📝 Abstract

Bi-causal optimal transport (OT) is a natural framework for comparing and coupling stochastic processes under nonanticipative information constraints, with important applications in robust finance, sequential uncertainty quantification, and multistage stochastic optimization. In particular, a learned bi-causal coupling naturally serves as a simulator for generating joint sample paths that respect both prescribed marginal laws and the underlying information flow. Its practical use, however, is limited by the computational difficulty of enforcing bi-causal coupling constraints over path space, especially for continuous distributions and long horizons. We develop a scalable stochastic-optimization framework for computing bi-causal OT couplings under general marginals. Our approach introduces a Kullback--Leibler (KL)-penalized relaxation that replaces hard marginal constraints with tractable divergence penalties while preserving the recursive structure of the problem. We establish dynamic programming principles for both the original and relaxed formulations, prove that the relaxed problem converges to the original bi-causal OT problem as the penalty grows, and derive explicit policy-gradient representations for the relaxed objective. Building on these results, we propose a practical policy-gradient algorithm with unbiased mini-batch estimators, variance reduction, and nonasymptotic regret guarantees. Numerical experiments show that the method accurately captures marginal laws and temporal dependence, and performs well in applications including robust subhedging and time series statistical downscaling. These results provide a scalable computational approach to bi-causal OT and broaden its applicability in settings where nonanticipative information constraints are essential.

Problem

Research questions and friction points this paper is trying to address.

bi-causal optimal transport

nonanticipative constraints

stochastic processes

computational scalability

path-space coupling

Innovation

Methods, ideas, or system contributions that make the work stand out.

bi-causal optimal transport

KL relaxation

policy gradients