Rectifying Regression in Reinforcement Learning

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This paper investigates how loss function selection affects the suboptimality gap of policies in value-function-based reinforcement learning. We systematically analyze and compare the theoretical properties and empirical performance of various loss functions—including mean absolute error (MAE), mean squared error (MSE), and binary/classification cross-entropy—under distinct regression targets (e.g., Bellman residual, advantage-weighted targets). Our theoretical analysis reveals that MAE and cross-entropy losses intrinsically yield tighter bounds on the policy suboptimality gap than MSE. Building on this insight, we propose an advantage-aware cross-entropy regression objective. Experiments within the linear RL framework demonstrate that our method consistently reduces the suboptimality gap and robustly outperforms MSE-based baselines. This work provides both theoretical grounding and practical guidance for joint design of loss functions and regression targets in value-function learning.

Technology Category

Application Category

📝 Abstract

This paper investigates the impact of the loss function in value-based methods for reinforcement learning through an analysis of underlying prediction objectives. We theoretically show that mean absolute error is a better prediction objective than the traditional mean squared error for controlling the learned policy's suboptimality gap. Furthermore, we present results that different loss functions are better aligned with these different regression objectives: binary and categorical cross-entropy losses with the mean absolute error and squared loss with the mean squared error. We then provide empirical evidence that algorithms minimizing these cross-entropy losses can outperform those based on the squared loss in linear reinforcement learning.

Problem

Research questions and friction points this paper is trying to address.

Analyzing loss function impact on value-based reinforcement learning methods

Demonstrating superior policy control with mean absolute error objective

Showing cross-entropy losses outperform squared loss in linear RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses mean absolute error for policy optimization

Employs cross-entropy losses with absolute error

Applies binary and categorical entropy in learning

🔎 Similar Papers

Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications