MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of low-quality trajectory generation and poor control accuracy in offline reinforcement learning under long-horizon sparse-reward settings. To this end, the authors propose a multi-scale autoregressive generative framework that explicitly models the multi-temporal structure of trajectories for the first time. By integrating a condition-guided multi-scale autoencoder with a multi-scale Transformer, the method enables coarse-to-fine, coherent, and controllable trajectory synthesis. The approach effectively unifies multi-scale representation learning with conditional generative modeling, substantially enhancing policy performance in long-horizon sparse-reward scenarios. Experimental results demonstrate that the proposed method outperforms 15 baseline algorithms across five offline RL benchmarks, achieving significant improvements in both trajectory coherence and control precision.

Technology Category

Application Category

📝 Abstract
Generative models have gained significant traction in offline reinforcement learning (RL) due to their ability to model complex trajectory distributions. However, existing generation-based approaches still struggle with long-horizon tasks characterized by sparse rewards. Some hierarchical generation methods have been developed to mitigate this issue by decomposing the original problem into shorter-horizon subproblems using one policy and generating detailed actions with another. While effective, these methods often overlook the multi-scale temporal structure inherent in trajectories, resulting in suboptimal performance. To overcome these limitations, we propose MAGE, a Multi-scale Autoregressive GEneration-based offline RL method. MAGE incorporates a condition-guided multi-scale autoencoder to learn hierarchical trajectory representations, along with a multi-scale transformer that autoregressively generates trajectory representations from coarse to fine temporal scales. MAGE effectively captures temporal dependencies of trajectories at multiple resolutions. Additionally, a condition-guided decoder is employed to exert precise control over short-term behaviors. Extensive experiments on five offline RL benchmarks against fifteen baseline algorithms show that MAGE successfully integrates multi-scale trajectory modeling with conditional guidance, generating coherent and controllable trajectories in long-horizon sparse-reward settings.
Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning
long-horizon tasks
sparse rewards
trajectory generation
multi-scale temporal structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-scale autoregressive generation
offline reinforcement learning
hierarchical trajectory representation
condition-guided decoding
sparse-reward tasks
🔎 Similar Papers
No similar papers found.
C
Chenxing Lin
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, School of Informatics, Xiamen University (XMU), China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, XMU, China
X
Xinhui Gao
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, School of Informatics, Xiamen University (XMU), China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, XMU, China
H
Haipeng Zhang
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, School of Informatics, Xiamen University (XMU), China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, XMU, China
Xinran Li
Xinran Li
The Hong Kong University of Science and Technology (HKUST)
reinforcement learningmulti-agent reinforcement learning
H
Haitao Wang
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, School of Informatics, Xiamen University (XMU), China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, XMU, China
S
Songzhu Mei
School of Computer, National University of Defense Technology, China
Chenglu Wen
Chenglu Wen
Professor of Xiamen University
3D visionpoint cloudsmobile mappingrobotics
W
Weiquan Liu
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, School of Informatics, Xiamen University (XMU), China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, XMU, China; College of Computer Engineering, Jimei University, China
Siqi Shen
Siqi Shen
Xiamen University
Reinforcement Learning3D Vision
C
Cheng Wang
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, School of Informatics, Xiamen University (XMU), China; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, XMU, China