🤖 AI Summary
This work addresses the actuator inversion problem in contextual reinforcement learning, where identical actions produce opposite physical effects under latent binary contexts, thereby hindering zero-shot generalization. To resolve this, the authors propose a shared adapter mechanism based on a single hypernetwork that, trained solely via dynamics prediction, generates compact adapter weights for the policy, value function, and dynamics model. The approach incorporates input/output normalization and stochastic input masking to induce an inductive bias toward the actuator inversion structure. Theoretically, the paper establishes a representational disentanglement framework and derives an upper bound on policy gradient variance, revealing the benefits of intra-modal compression for learning. Empirically, the authors introduce the first dedicated evaluation benchmark, the Actuator Inversion Benchmark (AIB), on which their method achieves zero-shot generalization, outperforming domain randomization by 111.8% and standard context-aware baselines by 16.1%.
📝 Abstract
Zero-shot generalization in contextual reinforcement learning remains a core challenge, particularly when the context is latent and must be inferred from data. A canonical failure mode is actuator inversion, where identical actions produce opposite physical effects under a latent binary context. We propose DMA*-SH, a framework where a single hypernetwork, trained solely via dynamics prediction, generates a small set of adapter weights shared across the dynamics model, policy, and action-value function. This shared modulation imparts an inductive bias matched to actuator inversion, while input/output normalization and random input masking stabilize context inference, promoting directionally concentrated representations. We provide theoretical support via an expressivity separation result for hypernetwork modulation, and a variance decomposition with policy-gradient variance bounds that formalize how within-mode compression improves learning under actuator inversion. For evaluation, we introduce the Actuator Inversion Benchmark (AIB), a suite of environments designed to isolate discontinuous context-to-dynamics interactions. On AIB's held-out actuator-inversion tasks, DMA*-SH achieves zero-shot generalization, outperforming domain randomization by 111.8% and surpassing a standard context-aware baseline by 16.1%.