๐ค AI Summary
This work reframes meta-reinforcement learning as a symmetry discovery problem to overcome the limitations of conventional approaches that rely on smooth task encodings and support only local generalization, particularly in sparse and structured task spaces. By introducing a โgenetic geometryโ induced by system symmetries, the method leverages Lie group actions to jointly transform states and actions, enabling non-local generalization across tasks. Building on differential symmetry discovery, the approach translates functional invariance constraints into efficiently solvable linearized, connected, and compact Lie subgroups, and proves that the task space can be embedded within such geometrically well-behaved subgroups. In 2D navigation tasks, the method accurately recovers the true underlying symmetries, achieving full-task-space generalization and significantly outperforming baselines that generalize only within neighborhoods of training tasks.
๐ Abstract
Meta-Reinforcement Learning (Meta-RL) commonly generalizes via smoothness in the task encoding. While this enables local generalization around each training task, it requires dense coverage of the task space and leaves richer task space structure untapped. In response, we develop a geometric perspective that endows the task space with a "hereditary geometry" induced by the inherent symmetries of the underlying system. Concretely, the agent reuses a policy learned at the train time by transforming states and actions through actions of a Lie group. This converts Meta-RL into symmetry discovery rather than smooth extrapolation, enabling the agent to generalize to wider regions of the task space. We show that when the task space is inherited from the symmetries of the underlying system, the task space embeds into a subgroup of those symmetries whose actions are linearizable, connected, and compact--properties that enable efficient learning and inference at the test time. To learn these structures, we develop a differential symmetry discovery method. This collapses functional invariance constraints and thereby improves numerical stability and sample efficiency over functional approaches. Empirically, on a two-dimensional navigation task, our method efficiently recovers the ground-truth symmetry and generalizes across the entire task space, while a common baseline generalizes only near training tasks.