Efficient Last-Iterate Convergence in Regret Minimization via Adaptive Reward Transformation

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Regret minimization methods typically guarantee only average-policy convergence, requiring substantial computation or introducing approximation errors; while existing reward transformation (RT) frameworks achieve last-iterate convergence, they rely on manual hyperparameter tuning, often violating theoretical assumptions and leading to oscillation, slow convergence, or suboptimal solutions. Method: We propose an adaptive reward transformation mechanism that dynamically adjusts parameters within the RTRM/RTCFR framework, eliminating manual intervention and aligning theoretical convergence guarantees with empirical performance. The mechanism online balances exploration and exploitation to optimize regret accumulation. Contribution/Results: Our approach achieves linear last-iterate convergence across diverse games and ensures the final-iterate policy efficiently and stably approximates a Nash equilibrium. Experiments on standard benchmarks demonstrate significant improvements over current state-of-the-art algorithms.

Technology Category

Application Category

📝 Abstract
Regret minimization is a powerful method for finding Nash equilibria in Normal-Form Games (NFGs) and Extensive-Form Games (EFGs), but it typically guarantees convergence only for the average strategy. However, computing the average strategy requires significant computational resources or introduces additional errors, limiting its practical applicability. The Reward Transformation (RT) framework was introduced to regret minimization to achieve last-iterate convergence through reward function regularization. However, it faces practical challenges: its performance is highly sensitive to manually tuned parameters, which often deviate from theoretical convergence conditions, leading to slow convergence, oscillations, or stagnation in local optima. Inspired by previous work, we propose an adaptive technique to address these issues, ensuring better consistency between theoretical guarantees and practical performance for RT Regret Matching (RTRM), RT Counterfactual Regret Minimization (RTCFR), and their variants in solving NFGs and EFGs more effectively. Our adaptive methods dynamically adjust parameters, balancing exploration and exploitation while improving regret accumulation, ultimately enhancing asymptotic last-iterate convergence and achieving linear convergence. Experimental results demonstrate that our methods significantly accelerate convergence, outperforming state-of-the-art algorithms.
Problem

Research questions and friction points this paper is trying to address.

Achieving last-iterate convergence in regret minimization algorithms
Overcoming sensitivity to manual parameter tuning in reward transformation
Enhancing practical performance for solving Normal-Form and Extensive-Form Games
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive parameter adjustment technique
Dynamic balance exploration and exploitation
Enhanced asymptotic last-iterate convergence
🔎 Similar Papers
No similar papers found.
H
Hang Ren
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China
Y
Yulin Wu
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China; Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies
S
Shuhan Qi
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China; Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies
Jiajia Zhang
Jiajia Zhang
Department of Epidemiology and Biostatistics, University of South Carolina
Survival AnalysisMixture Cure ModelSpatial SurvivalFrailty
X
Xiaozhen Sun
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China
T
Tianzi Ma
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China
X
Xuan Wang
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), China; Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies