🤖 AI Summary
This work addresses a key limitation in existing hierarchical offline goal-conditioned reinforcement learning, where reliance on value-based representations often fails to distinguish states critical for action selection, thereby constraining control performance. The authors propose an information-theoretic framework that explicitly differentiates between “value sufficiency” and “action sufficiency,” demonstrating that the latter is essential for preserving sufficient information between high-level planning and low-level execution to enable optimal action selection. Through information-theoretic analysis, a hierarchical policy architecture, and log-loss–based training of the low-level policy, the resulting action-sufficient representations consistently outperform conventional value-based goal representations across discrete environments and standard benchmarks, empirically validating their strong correlation with improved control performance.
📝 Abstract
Hierarchical policies in offline goal-conditioned reinforcement learning (GCRL) addresses long-horizon tasks by decomposing control into high-level subgoal planning and low-level action execution. A critical design choice in such architectures is the goal representation-the compressed encoding of goals that serves as the interface between these levels. Existing approaches commonly derive goal representations while learning value functions, implicitly assuming that preserving information sufficient for value estimation is adequate for optimal control. We show that this assumption can fail, even when the value estimation is exact, as such representations may collapse goal states that need to be differentiated for action learning. To address this, we introduce an information-theoretic framework that defines action sufficiency, a condition on goal representations necessary for optimal action selection. We prove that value sufficiency does not imply action sufficiency and empirically verify that the latter is more strongly associated with control success in a discrete environment. We further demonstrate that standard log-loss training of low-level policies naturally induces action-sufficient representations. Our experimental results a popular benchmark demonstrate that our actor-derived representations consistently outperform representations learned via value estimation.