Performance Optimization of Ratings-Based Reinforcement Learning

📅 2025-01-13

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

In reward-free settings, preference-based reinforcement learning (RbRL) suffers from high sensitivity to hyperparameters and poor robustness. Method: This paper systematically investigates the quantitative impact of key hyperparameters—including learning rate and preference sampling strategy—on reward reconstruction accuracy and policy convergence. It proposes an improved RbRL framework that models human preference consistency via cross-entropy loss and conducts comprehensive ablation studies and controlled experiments. Contribution/Results: We present the first reproducible, empirically grounded RbRL hyperparameter tuning guide, bridging a critical gap between theoretical development and practical deployment. Experiments demonstrate that our approach significantly enhances RbRL’s stability and generalization under real-world human feedback, establishing a reliable paradigm for implicit-reward-driven policy learning.

Technology Category

Application Category

📝 Abstract

This paper explores multiple optimization methods to improve the performance of rating-based reinforcement learning (RbRL). RbRL, a method based on the idea of human ratings, has been developed to infer reward functions in reward-free environments for the subsequent policy learning via standard reinforcement learning, which requires the availability of reward functions. Specifically, RbRL minimizes the cross entropy loss that quantifies the differences between human ratings and estimated ratings derived from the inferred reward. Hence, a low loss means a high degree of consistency between human ratings and estimated ratings. Despite its simple form, RbRL has various hyperparameters and can be sensitive to various factors. Therefore, it is critical to provide comprehensive experiments to understand the impact of various hyperparameters on the performance of RbRL. This paper is a work in progress, providing users some general guidelines on how to select hyperparameters in RbRL.

Problem

Research questions and friction points this paper is trying to address.

RbRL Effectiveness

Parameter Sensitivity

Scoring Guided Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimization Methods

Parameter Tuning

Human Rating Learning

🔎 Similar Papers

Can Learned Optimization Make Reinforcement Learning Less Difficult?