Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing context-adaptive meta-reinforcement learning approaches infer tasks from action sequences, resulting in strong coupling between task representation and policy—hindering generalization and modular training. To address this, we propose the Action-Free Belief Model (AFBM), which performs task-context inference solely from state-reward sequences, thereby achieving full decoupling of task inference from policy optimization for the first time. Our method employs a Transformer encoder-decoder architecture with Rotary Position Embedding (RoPE) and amortized variational inference to jointly model both parametric and nonparametric task variations. Evaluated on the MetaWorld ML-10 benchmark, AFBM demonstrates significant improvements in cross-task generalization, adaptation speed, and exploration efficiency, consistently outperforming state-of-the-art context-adaptive meta-RL methods.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) enables robots to operate in uncertain environments, but standard approaches often struggle with poor generalization to unseen tasks. Context-adaptive meta reinforcement learning addresses these limitations by conditioning on the task representation, yet they mostly rely on complete action information in the experience making task inference tightly coupled to a specific policy. This paper introduces Context Representation via Action Free Transformer encoder decoder (CRAFT), a belief model that infers task representations solely from sequences of states and rewards. By removing the dependence on actions, CRAFT decouples task inference from policy optimization, supports modular training, and leverages amortized variational inference for scalable belief updates. Built on a transformer encoder decoder with rotary positional embeddings, the model captures long range temporal dependencies and robustly encodes both parametric and non-parametric task variations. Experiments on the MetaWorld ML-10 robotic manipulation benchmark show that CRAFT achieves faster adaptation, improved generalization, and more effective exploration compared to context adaptive meta--RL baselines. These findings highlight the potential of action-free inference as a foundation for scalable RL in robotic control.
Problem

Research questions and friction points this paper is trying to address.

Decouples task inference from policy optimization
Infers task representations without action information
Improves generalization and adaptation in meta-reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-free transformer encoder-decoder for task inference
Decouples task inference from policy optimization
Uses amortized variational inference for scalable belief updates
A
Amir M. Soufi Enayati
Department of Mechanical Engineering, University of Victoria
H
Homayoun Honari
Mila-Quebec AI Institute
Homayoun Najjaran
Homayoun Najjaran
University of Victoria
ControlRoboticsAutomation