🤖 AI Summary
This work addresses the challenge of reinforcement learning in large discrete combinatorial action spaces, where exponential search complexity hinders effective exploration. Existing approaches either assume independence among sub-actions—leading to invalid action combinations—or jointly learn action structure and policy, resulting in slow and unstable training. To overcome these limitations, the authors propose SPIN, a two-stage framework that decouples action structure modeling from policy learning for the first time. In the first stage, an Action Structure Model (ASM) is pre-trained to learn a valid action manifold; in the second, its representation is frozen while a lightweight policy head is fine-tuned for control. This approach substantially improves stability, sample efficiency, and final performance in offline reinforcement learning, achieving up to a 39% increase in average return and up to 12.8× faster convergence on discrete DM Control benchmarks.
📝 Abstract
Reinforcement learning in discrete combinatorial action spaces requires searching over exponentially many joint actions to simultaneously select multiple sub-actions that form coherent combinations. Existing approaches either simplify policy learning by assuming independence across sub-actions, which often yields incoherent or invalid actions, or attempt to learn action structure and control jointly, which is slow and unstable. We introduce Structured Policy Initialization (SPIN), a two-stage framework that first pre-trains an Action Structure Model (ASM) to capture the manifold of valid actions, then freezes this representation and trains lightweight policy heads for control. On challenging discrete DM Control benchmarks, SPIN improves average return by up to 39% over the state of the art while reducing time to convergence by up to 12.8$\times$.