Scalable Offline Model-Based RL with Action Chunks

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

In offline reinforcement learning, model-based approaches suffer from performance degradation on long-horizon tasks due to error accumulation during multi-step rollouts. To address this, we propose the Model-Augmented Chunking (MAC) framework: (1) it introduces action chunking—modeling sequences of actions instead of single-step actions—to substantially mitigate error propagation in extended rollouts; (2) it integrates a high-capacity behavior policy with a rejection-sampling mechanism to suppress model misuse induced by out-of-distribution actions; and (3) it combines model-based value expansion with length-𝑛 imagined rollouts to enhance policy robustness and generalization. Experiments on billion-scale transition datasets demonstrate that MAC consistently outperforms existing model-based offline RL methods across diverse benchmarks—particularly on challenging long-horizon tasks—establishing a scalable and robust paradigm for large-scale, long-horizon offline decision-making.

Technology Category

Application Category

📝 Abstract

In this paper, we study whether model-based reinforcement learning (RL), in particular model-based value expansion, can provide a scalable recipe for tackling complex, long-horizon tasks in offline RL. Model-based value expansion fits an on-policy value function using length-n imaginary rollouts generated by the current policy and a learned dynamics model. While larger n reduces bias in value bootstrapping, it amplifies accumulated model errors over long horizons, degrading future predictions. We address this trade-off with an emph{action-chunk} model that predicts a future state from a sequence of actions (an "action chunk") instead of a single action, which reduces compounding errors. In addition, instead of directly training a policy to maximize rewards, we employ rejection sampling from an expressive behavioral action-chunk policy, which prevents model exploitation from out-of-distribution actions. We call this recipe extbf{Model-Based RL with Action Chunks (MAC)}. Through experiments on highly challenging tasks with large-scale datasets of up to 100M transitions, we show that MAC achieves the best performance among offline model-based RL algorithms, especially on challenging long-horizon tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses bias-error tradeoff in offline model-based RL

Reduces compounding model errors with action-chunk predictions

Prevents model exploitation via expressive behavioral policy sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-chunk model reduces compounding errors

Rejection sampling prevents out-of-distribution actions

Scalable offline RL for long-horizon tasks

🔎 Similar Papers

Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining