Refined Analysis of Entropy-Regularized Actor-Critic

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work investigates the impact of the Critic on policy update variance and convergence in entropy-regularized Actor-Critic algorithms. Under a finite-horizon, discounted setting with entropy regularization, we provide the first rigorous proof that an exact Critic, when used as a baseline, substantially reduces the variance of policy gradients; moreover, even with small approximation errors, it still ensures rapid convergence. Stochastic gradient analysis reveals that, given an exact Critic, the algorithm achieves an ε-optimal regularized value function with only Õ(log(1/ε)) samples, matching the sample complexity of deterministic policy gradient methods. These findings underscore the critical importance of prioritizing accurate Critic learning in such frameworks.

📝 Abstract

In this paper, we study the role of the critic in actor--critic for entropy-regularized, finite, discounted environments. We establish that, when the critic is exact, using the latter as a baseline is a variance-reduction method in a strong sense. In this case, actor--critic with stochastic gradients matches the sample complexity of deterministic policy gradient, reaching an $ε$-optimal regularized value with $\tilde{O}(\log(1/ε))$ samples. In practice, the critic is learned alongside the actor: the variance of the actor update is then influenced by the critic's variance and bias. Specifically, when the critic has a sufficiently small error, the variance reduction and rapid convergence are preserved. This suggests to learn the critic first, keeping it up to date after each actor update, underscoring the crucial role of accurate critic estimation in actor--critic methods.

Problem

Research questions and friction points this paper is trying to address.

actor-critic

entropy regularization

variance reduction

critic estimation

sample complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

entropy regularization

actor-critic

variance reduction