TIMRL: A Novel Meta-Reinforcement Learning Framework for Non-Stationary and Multi-Task Environments

📅 2025-01-13

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

To address weak task representation, inaccurate task identification, and low sample efficiency in non-stationary multi-task reinforcement learning (RL), this paper proposes a task inference framework integrating Gaussian Mixture Models (GMM) with Transformer architectures. Departing from conventional meta-RL approaches that assume a single Gaussian prior over tasks, our method enables explicit task encoding and dynamic, probabilistic task classification—overcoming limitations of static task representations. The framework jointly optimizes supervised task classification and MAML-style policy adaptation. Evaluated on MuJoCo-based non-stationary multi-task benchmarks, it achieves a +23.6% improvement in task identification accuracy and state-of-the-art sample efficiency, while enhancing cross-task policy generalization. Our core contribution is the GMM-Transformer co-modeling mechanism, which—novel in meta-RL—unifies probabilistic mixture modeling with sequential task reasoning within a single unified architecture.

Technology Category

Application Category

📝 Abstract

In recent years, meta-reinforcement learning (meta-RL) algorithm has been proposed to improve sample efficiency in the field of decision-making and control, enabling agents to learn new knowledge from a small number of samples. However, most research uses the Gaussian distribution to extract task representation, which is poorly adapted to tasks that change in non-stationary environment. To address this problem, we propose a novel meta-reinforcement learning method by leveraging Gaussian mixture model and the transformer network to construct task inference model. The Gaussian mixture model is utilized to extend the task representation and conduct explicit encoding of tasks. Specifically, the classification of tasks is encoded through transformer network to determine the Gaussian component corresponding to the task. By leveraging task labels, the transformer network is trained using supervised learning. We validate our method on MuJoCo benchmarks with non-stationary and multi-task environments. Experimental results demonstrate that the proposed method dramatically improves sample efficiency and accurately recognizes the classification of the tasks, while performing excellently in the environment.

Problem

Research questions and friction points this paper is trying to address.

Robot Learning

Task Recognition

Adaptive Environment

Innovation

Methods, ideas, or system contributions that make the work stand out.

TIMRL

Gaussian Mixture Models

Transformer Networks

🔎 Similar Papers

Boosting Hierarchical Reinforcement Learning with Meta-Learning for Complex Task Adaptation