🤖 AI Summary
Offline reinforcement learning (RL) faces severe scalability challenges in high-dimensional discrete combinatorial action spaces, where the joint action space grows exponentially and modeling dependencies among constituent actions remains difficult.
Method: This paper proposes a tree-structured traversal-based branch value estimation method. It introduces a novel hierarchical action decomposition scheme coupled with a top-down traversal mechanism, enabling accurate modeling of inter-subaction dependencies in linear time complexity—bypassing the restrictive independence assumptions of conventional factorized Q-networks and the computational intractability of exhaustive evaluation.
Contribution/Results: Evaluated on benchmark environments with over 4 million possible actions, our approach achieves up to a 20× improvement in policy performance over state-of-the-art offline RL methods, while significantly accelerating convergence and enhancing cross-task generalization capability.
📝 Abstract
Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to $20 imes$ in environments with over four million actions.