A Method for Evaluating Hyperparameter Sensitivity in Reinforcement Learning

📅 2024-12-10
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning (RL) algorithm performance is frequently overestimated, as observed improvements often stem from hyperparameter tuning rather than intrinsic algorithmic advances. Method: This work introduces the first systematic definition and quantification of algorithmic hyperparameter sensitivity, proposing a scalable, cross-environment empirical evaluation framework. It integrates grid and random hyperparameter sampling with variance decomposition, validated across diverse benchmarks—including MuJoCo and Procgen—and augmented with statistical significance testing. Contribution/Results: The analysis reveals that purported performance gains of several PPO normalization variants are accompanied by substantially increased hyperparameter sensitivity, indicating degraded robustness. This establishes the first standardized metric for hyperparameter sensitivity in RL, enabling de-biased, tuning-agnostic algorithm evaluation. By exposing hidden trade-offs between peak performance and robustness, the framework advances reproducible, trustworthy RL research.

Technology Category

Application Category

📝 Abstract
The performance of modern reinforcement learning algorithms critically relies on tuning ever-increasing numbers of hyperparameters. Often, small changes in a hyperparameter can lead to drastic changes in performance, and different environments require very different hyperparameter settings to achieve state-of-the-art performance reported in the literature. We currently lack a scalable and widely accepted approach to characterizing these complex interactions. This work proposes a new empirical methodology for studying, comparing, and quantifying the sensitivity of an algorithm's performance to hyperparameter tuning for a given set of environments. We then demonstrate the utility of this methodology by assessing the hyperparameter sensitivity of several commonly used normalization variants of PPO. The results suggest that several algorithmic performance improvements may, in fact, be a result of an increased reliance on hyperparameter tuning.
Problem

Research questions and friction points this paper is trying to address.

Evaluating hyperparameter sensitivity in reinforcement learning
Lack of scalable approach for hyperparameter interaction
Assessing sensitivity of PPO normalization variants
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyperparameter sensitivity evaluation method
Scalable empirical methodology for RL
PPO normalization variants assessed
🔎 Similar Papers
No similar papers found.