Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the challenge of achieving optimal last-iterate convergence in zero-sum matrix games with bandit feedback, where existing uncoupled algorithms fall short. The authors propose an online mirror descent method based on log-barrier regularization, analyzed through a dual-space framework, which for the first time attains an exploitability gap convergence rate of Õ(t⁻¹/⁴) with high probability in the uncoupled setting—matching the known theoretical lower bound Ω(t⁻¹/⁴). The approach is further extended to extensive-form games while preserving this optimal convergence rate, thereby significantly advancing the theoretical foundations of uncoupled multi-agent learning.

Technology Category

Application Category

📝 Abstract

We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde(t^{-1/4}) convergence with high-probability. We additionally extend our idea to the setting of extensive-form games, proving a bound with the same rate.

Problem

Research questions and friction points this paper is trying to address.

last-iterate convergence

zero-sum matrix games

bandit feedback

uncoupled learning

exploitability gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

log-barrier regularization

last-iterate convergence

bandit feedback