Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing context-adaptive meta-reinforcement learning approaches infer tasks from action sequences, resulting in strong coupling between task representation and policy—hindering generalization and modular training. To address this, we propose the Action-Free Belief Model (AFBM), which performs task-context inference solely from state-reward sequences, thereby achieving full decoupling of task inference from policy optimization for the first time. Our method employs a Transformer encoder-decoder architecture with Rotary Position Embedding (RoPE) and amortized variational inference to jointly model both parametric and nonparametric task variations. Evaluated on the MetaWorld ML-10 benchmark, AFBM demonstrates significant improvements in cross-task generalization, adaptation speed, and exploration efficiency, consistently outperforming state-of-the-art context-adaptive meta-RL methods.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) enables robots to operate in uncertain environments, but standard approaches often struggle with poor generalization to unseen tasks. Context-adaptive meta reinforcement learning addresses these limitations by conditioning on the task representation, yet they mostly rely on complete action information in the experience making task inference tightly coupled to a specific policy. This paper introduces Context Representation via Action Free Transformer encoder decoder (CRAFT), a belief model that infers task representations solely from sequences of states and rewards. By removing the dependence on actions, CRAFT decouples task inference from policy optimization, supports modular training, and leverages amortized variational inference for scalable belief updates. Built on a transformer encoder decoder with rotary positional embeddings, the model captures long range temporal dependencies and robustly encodes both parametric and non-parametric task variations. Experiments on the MetaWorld ML-10 robotic manipulation benchmark show that CRAFT achieves faster adaptation, improved generalization, and more effective exploration compared to context adaptive meta--RL baselines. These findings highlight the potential of action-free inference as a foundation for scalable RL in robotic control.

Problem

Research questions and friction points this paper is trying to address.

Decouples task inference from policy optimization

Infers task representations without action information

Improves generalization and adaptation in meta-reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-free transformer encoder-decoder for task inference

Decouples task inference from policy optimization

Uses amortized variational inference for scalable belief updates

🔎 Similar Papers

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning

2024-05-22Citations: 6

Retrieval-Augmented Decision Transformer: External Memory for In-context RL

2024-10-09arXiv.orgCitations: 9