Symmetric Behavior Regularization via Taylor Expansion of Symmetry

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing offline reinforcement learning (RL) methods predominantly employ asymmetric f-divergences—such as KL divergence—for behavioral regularization, enabling analytic policy solutions and mitigating numerical instability; symmetric f-divergences have been largely overlooked due to the absence of closed-form solutions and susceptibility to gradient explosion. Method: This paper introduces symmetric f-divergences into behavioral regularization for the first time, proposing an analytically tractable policy optimization framework based on second-order Taylor expansion. By decomposing the symmetric divergence into symmetric and conditional-symmetric components, we derive explicit closed-form policy updates and decouple the loss function to enhance numerical stability. Contribution/Results: Our approach achieves state-of-the-art performance on MuJoCo benchmarks and distribution-matching tasks, significantly outperforming mainstream offline RL algorithms. It bridges theoretical rigor—via principled symmetric regularization—with strong empirical robustness, establishing a new foundation for stable and expressive offline policy learning.

Technology Category

Application Category

📝 Abstract

This paper introduces symmetric divergences to behavior regularization policy optimization (BRPO) to establish a novel offline RL framework. Existing methods focus on asymmetric divergences such as KL to obtain analytic regularized policies and a practical minimization objective. We show that symmetric divergences do not permit an analytic policy as regularization and can incur numerical issues as loss. We tackle these challenges by the Taylor series of $f$-divergence. Specifically, we prove that an analytic policy can be obtained with a finite series. For loss, we observe that symmetric divergences can be decomposed into an asymmetry and a conditional symmetry term, Taylor-expanding the latter alleviates numerical issues. Summing together, we propose Symmetric $f$ Actor-Critic (S$f$-AC), the first practical BRPO algorithm with symmetric divergences. Experimental results on distribution approximation and MuJoCo verify that S$f$-AC performs competitively.

Problem

Research questions and friction points this paper is trying to address.

Introduces symmetric divergences to offline RL framework

Addresses numerical issues with symmetric divergences via Taylor expansion

Proposes Symmetric f Actor-Critic for practical BRPO algorithm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces symmetric divergences to BRPO

Uses Taylor series of f-divergence for analytic policy

Proposes Symmetric f Actor-Critic algorithm

🔎 Similar Papers

Remove Symmetries to Control Model Expressivity