MAGE: Multi-scale Autoregressive Generation for Offline Reinforcement Learning

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the challenges of low-quality trajectory generation and poor control accuracy in offline reinforcement learning under long-horizon sparse-reward settings. To this end, the authors propose a multi-scale autoregressive generative framework that explicitly models the multi-temporal structure of trajectories for the first time. By integrating a condition-guided multi-scale autoencoder with a multi-scale Transformer, the method enables coarse-to-fine, coherent, and controllable trajectory synthesis. The approach effectively unifies multi-scale representation learning with conditional generative modeling, substantially enhancing policy performance in long-horizon sparse-reward scenarios. Experimental results demonstrate that the proposed method outperforms 15 baseline algorithms across five offline RL benchmarks, achieving significant improvements in both trajectory coherence and control precision.

Technology Category

Application Category

📝 Abstract

Generative models have gained significant traction in offline reinforcement learning (RL) due to their ability to model complex trajectory distributions. However, existing generation-based approaches still struggle with long-horizon tasks characterized by sparse rewards. Some hierarchical generation methods have been developed to mitigate this issue by decomposing the original problem into shorter-horizon subproblems using one policy and generating detailed actions with another. While effective, these methods often overlook the multi-scale temporal structure inherent in trajectories, resulting in suboptimal performance. To overcome these limitations, we propose MAGE, a Multi-scale Autoregressive GEneration-based offline RL method. MAGE incorporates a condition-guided multi-scale autoencoder to learn hierarchical trajectory representations, along with a multi-scale transformer that autoregressively generates trajectory representations from coarse to fine temporal scales. MAGE effectively captures temporal dependencies of trajectories at multiple resolutions. Additionally, a condition-guided decoder is employed to exert precise control over short-term behaviors. Extensive experiments on five offline RL benchmarks against fifteen baseline algorithms show that MAGE successfully integrates multi-scale trajectory modeling with conditional guidance, generating coherent and controllable trajectories in long-horizon sparse-reward settings.

Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning

long-horizon tasks

sparse rewards

trajectory generation

multi-scale temporal structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-scale autoregressive generation

offline reinforcement learning

hierarchical trajectory representation