Meta-learning how to Share Credit among Macro-Actions

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Macro-actions can improve exploration efficiency in reinforcement learning, yet naïve expansion often degrades exploration performance due to the trade-off between exponential growth of action-space dimensionality and reduced decision frequency per episode. This paper proposes a similarity-regularized meta-learning framework: it models action–macro-action similarity as a learnable low-dimensional geometric structure, relaxing conventional assumptions of macro-action atomicity and independence; this structure explicitly constrains the policy network to enhance credit assignment and effectively reduce the intrinsic dimensionality of the action space. Crucially, our method enables cross-task meta-learning and transfer of macro-action similarities—a first in the literature. Evaluated on Atari and Street Fighter II, it significantly outperforms the Rainbow-DQN baseline. Moreover, the learned similarity matrix generalizes to related environments, demonstrating strong transferability and robustness.

Technology Category

Application Category

📝 Abstract
One proposed mechanism to improve exploration in reinforcement learning is through the use of macro-actions. Paradoxically though, in many scenarios the naive addition of macro-actions does not lead to better exploration, but rather the opposite. It has been argued that this was caused by adding non-useful macros and multiple works have focused on mechanisms to discover effectively environment-specific useful macros. In this work, we take a slightly different perspective. We argue that the difficulty stems from the trade-offs between reducing the average number of decisions per episode versus increasing the size of the action space. Namely, one typically treats each potential macro-action as independent and atomic, hence strictly increasing the search space and making typical exploration strategies inefficient. To address this problem we propose a novel regularization term that exploits the relationship between actions and macro-actions to improve the credit assignment mechanism by reducing the effective dimension of the action space and, therefore, improving exploration. The term relies on a similarity matrix that is meta-learned jointly with learning the desired policy. We empirically validate our strategy looking at macro-actions in Atari games, and the StreetFighter II environment. Our results show significant improvements over the Rainbow-DQN baseline in all environments. Additionally, we show that the macro-action similarity is transferable to related environments. We believe this work is a small but important step towards understanding how the similarity-imposed geometry on the action space can be exploited to improve credit assignment and exploration, therefore making learning more effective.
Problem

Research questions and friction points this paper is trying to address.

Improving exploration in reinforcement learning with macro-actions
Addressing trade-offs between decision reduction and action space size
Enhancing credit assignment via meta-learned action similarity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learned similarity matrix for credit assignment
Regularization term reduces action space dimension
Transferable macro-action similarity across environments
🔎 Similar Papers
No similar papers found.