🤖 AI Summary
Gradient-boosted trees (GBTs) struggle to adapt to the non-stationary data distributions inherent in online reinforcement learning (RL). Method: This paper proposes GBRL—a novel framework that introduces GBTs into online RL for the first time, supporting actor-critic architectures and policy optimization. It innovatively incorporates tree-structured parameter sharing, per-parameter adaptive learning rates, and GPU acceleration—preserving GBTs’ native compatibility with structured and categorical features while enhancing interpretability and deployment efficiency. Contribution/Results: GBRL achieves performance on par with deep neural networks across diverse RL benchmarks, yet yields significantly more compact models. We release an open-source, high-performance GBRL library (GitHub: NVlabs/gbrl), fully compatible with mainstream RL frameworks such as Stable-Baselines3. GBRL bridges a critical gap in integrating gradient boosting with online RL, enabling scalable, interpretable, and efficient tree-based RL.
📝 Abstract
Neural networks (NN) achieve remarkable results in various tasks, but lack key characteristics: interpretability, support for categorical features, and lightweight implementations suitable for edge devices. While ongoing efforts aim to address these challenges, Gradient Boosting Trees (GBT) inherently meet these requirements. As a result, GBTs have become the go-to method for supervised learning tasks in many real-world applications and competitions. However, their application in online learning scenarios, notably in reinforcement learning (RL), has been limited. In this work, we bridge this gap by introducing Gradient-Boosting RL (GBRL), a framework that extends the advantages of GBT to the RL domain. Using the GBRL framework, we implement various actor-critic algorithms and compare their performance with their NN counterparts. Inspired by shared backbones in NN we introduce a tree-sharing approach for policy and value functions with distinct learning rates, enhancing learning efficiency over millions of interactions. GBRL achieves competitive performance across a diverse array of tasks, excelling in domains with structured or categorical features. Additionally, we present a high-performance, GPU-accelerated implementation that integrates seamlessly with widely-used RL libraries (available at https://github.com/NVlabs/gbrl). GBRL expands the toolkit for RL practitioners, demonstrating the viability and promise of GBT within the RL paradigm, particularly in domains characterized by structured or categorical features.