BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

📅 2024-10-28

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

Offline reinforcement learning (RL) faces severe scalability challenges in high-dimensional discrete combinatorial action spaces, where the joint action space grows exponentially and modeling dependencies among constituent actions remains difficult. Method: This paper proposes a tree-structured traversal-based branch value estimation method. It introduces a novel hierarchical action decomposition scheme coupled with a top-down traversal mechanism, enabling accurate modeling of inter-subaction dependencies in linear time complexity—bypassing the restrictive independence assumptions of conventional factorized Q-networks and the computational intractability of exhaustive evaluation. Contribution/Results: Evaluated on benchmark environments with over 4 million possible actions, our approach achieves up to a 20× improvement in policy performance over state-of-the-art offline RL methods, while significantly accelerating convergence and enhancing cross-task generalization capability.

Technology Category

Application Category

📝 Abstract

Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to $20 imes$ in environments with over four million actions.

Problem

Research questions and friction points this paper is trying to address.

Challenges in offline RL for high-dimensional discrete action spaces

Existing methods fail to model sub-action dependencies efficiently

Proposes BraVE for scalable action evaluation with dependency preservation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-structured action traversal for efficient evaluation

Linear evaluation of joint actions preserving dependencies

Outperforms prior methods in high-dimensional spaces

🔎 Similar Papers

State-Constrained Offline Reinforcement Learning