Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the challenge of zero-shot transfer in hierarchical reinforcement learning under general reward functions, where existing methods often rely on fixed temporal abstractions or handcrafted subgoals. The authors propose Switching Successor Measures—a natural extension of classical successor measures—that jointly induce high-level subgoals and low-level control policies from a single structured representation, without requiring additional supervision, fixed time horizons, or predefined subgoals. Building upon the forward–backward (FB) representation, they introduce the FB π-Switch algorithm to directly extract hierarchical policies. Empirical results demonstrate that the approach outperforms non-hierarchical baselines across both generic reward-based and goal-conditioned tasks, achieving state-of-the-art performance in the latter setting.

📝 Abstract

Hierarchical reinforcement learning can improve generalization by decomposing long-horizon decision-making into simpler subproblems. However, existing approaches often rely on restrictive design choices, such as fixed temporal abstractions or goal-conditioned objectives, which largely confine them to goal-reaching tasks and limit their applicability to general reward functions. In this paper, we introduce switching successor measures, an extension of successor measures that enables hierarchical control in zero-shot reinforcement learning without additional supervision, fixed horizons, or manually designed subgoals. We show that switching successor measures arise naturally from classical successor measures while preserving their underlying structure. Building on this result, we propose FB $π$-Switch, an algorithm that extracts both a high-level subgoal-selection policy and a low-level control policy directly from forward-backward (FB) representations, allowing hierarchical behavior to emerge from a single learned representation. Experiments on both goal-conditioned and general reward-based tasks show that FB $π$-Switch improves over non-hierarchical baselines and matches state-of-the-art hierarchical methods in goal-conditioned settings. These results demonstrate that structured successor representations provide a flexible foundation for hierarchical zero-shot reinforcement learning beyond goal-reaching tasks. Our project website is available at: https://stestokth.github.io/switching-successors/.

Problem

Research questions and friction points this paper is trying to address.

hierarchical reinforcement learning

zero-shot reinforcement learning

successor measures

general reward functions

temporal abstraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

switching successor measures

hierarchical reinforcement learning

zero-shot reinforcement learning