BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

📅 2024-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Offline reinforcement learning (RL) faces severe scalability challenges in high-dimensional discrete combinatorial action spaces, where the joint action space grows exponentially and modeling dependencies among constituent actions remains difficult. Method: This paper proposes a tree-structured traversal-based branch value estimation method. It introduces a novel hierarchical action decomposition scheme coupled with a top-down traversal mechanism, enabling accurate modeling of inter-subaction dependencies in linear time complexity—bypassing the restrictive independence assumptions of conventional factorized Q-networks and the computational intractability of exhaustive evaluation. Contribution/Results: Evaluated on benchmark environments with over 4 million possible actions, our approach achieves up to a 20× improvement in policy performance over state-of-the-art offline RL methods, while significantly accelerating convergence and enhancing cross-task generalization capability.

Technology Category

Application Category

📝 Abstract
Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to $20 imes$ in environments with over four million actions.
Problem

Research questions and friction points this paper is trying to address.

Challenges in offline RL for high-dimensional discrete action spaces
Existing methods fail to model sub-action dependencies efficiently
Proposes BraVE for scalable action evaluation with dependency preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-structured action traversal for efficient evaluation
Linear evaluation of joint actions preserving dependencies
Outperforms prior methods in high-dimensional spaces
🔎 Similar Papers
No similar papers found.