Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the challenge of achieving optimal last-iterate convergence in zero-sum matrix games with bandit feedback, where existing uncoupled algorithms fall short. The authors propose an online mirror descent method based on log-barrier regularization, analyzed through a dual-space framework, which for the first time attains an exploitability gap convergence rate of Õ(t⁻¹/⁴) with high probability in the uncoupled setting—matching the known theoretical lower bound Ω(t⁻¹/⁴). The approach is further extended to extensive-form games while preserving this optimal convergence rate, thereby significantly advancing the theoretical foundations of uncoupled multi-agent learning.

Technology Category

Application Category

📝 Abstract
We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde(t^{-1/4}) convergence with high-probability. We additionally extend our idea to the setting of extensive-form games, proving a bound with the same rate.
Problem

Research questions and friction points this paper is trying to address.

last-iterate convergence
zero-sum matrix games
bandit feedback
uncoupled learning
exploitability gap
Innovation

Methods, ideas, or system contributions that make the work stand out.

log-barrier regularization
last-iterate convergence
bandit feedback
zero-sum matrix games
online mirror descent