MrCoM: A Meta-Regularized World-Model Generalizing Across Multi-Scenarios

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the limited multi-scenario generalization capability of existing model-based reinforcement learning (MBRL) methods by proposing a unified, cross-task transferable world model. Methodologically, it introduces: (1) a dynamics-driven latent state space decomposition that disentangles scenario-specific and shared representations; (2) meta-state and meta-value regularization to align policy optimization with world modeling objectives; and (3) a variational inference-based contextual world model architecture integrating meta-learning with dynamic modeling. Theoretical analysis derives an upper bound on multi-scenario generalization error. Empirical evaluation demonstrates that the proposed approach significantly outperforms state-of-the-art MBRL methods on multi-scenario benchmarks, achieving superior generalization performance and sample efficiency.

Technology Category

Application Category

📝 Abstract
Model-based reinforcement learning (MBRL) is a crucial approach to enhance the generalization capabilities and improve the sample efficiency of RL algorithms. However, current MBRL methods focus primarily on building world models for single tasks and rarely address generalization across different scenarios. Building on the insight that dynamics within the same simulation engine share inherent properties, we attempt to construct a unified world model capable of generalizing across different scenarios, named Meta-Regularized Contextual World-Model (MrCoM). This method first decomposes the latent state space into various components based on the dynamic characteristics, thereby enhancing the accuracy of world-model prediction. Further, MrCoM adopts meta-state regularization to extract unified representation of scenario-relevant information, and meta-value regularization to align world-model optimization with policy learning across diverse scenario objectives. We theoretically analyze the generalization error upper bound of MrCoM in multi-scenario settings. We systematically evaluate our algorithm's generalization ability across diverse scenarios, demonstrating significantly better performance than previous state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Building unified world models for multi-scenario generalization in reinforcement learning
Decomposing latent state space to improve world-model prediction accuracy
Aligning world-model optimization with policy learning across diverse scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposing latent state space for dynamic characteristics
Using meta-state regularization for unified representation
Applying meta-value regularization for policy alignment
🔎 Similar Papers
No similar papers found.
X
Xuantang Xiong
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
N
Ni Mu
Department of Automation, Tsinghua University
R
Runpeng Xie
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
S
Senhao Yang
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Y
Yaqing Wang
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
L
Lexiang Wang
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Y
Yao Luan
Department of Automation, Tsinghua University
S
Siyuan Li
Faculty of Computing, Harbin Institute of Technology
S
Shuang Xu
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yiqin Yang
Yiqin Yang
Assistant Professor,Institue of Automation,Chinese Academy of Sciences
Reinforcement LearningEmbodied Intelligence
B
Bo Xu
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China