Reinforcement Learning with Action Chunking

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

257K/year

🤖 AI Summary

To address insufficient exploration and low sample efficiency in online reinforcement learning for long-horizon, sparse-reward tasks, this paper proposes Q-chunking: the first offline-augmented method to embed action chunking into the temporal difference (TD) framework. Q-chunking directly optimizes policies over a chunked action space, leverages offline data to model temporally consistent behaviors, and supports unbiased n-step return computation—thereby balancing exploration stability and policy generalization. Theoretically, it guarantees unbiased policy updates under standard TD assumptions. Empirically, Q-chunking significantly outperforms state-of-the-art offline RL and online fine-tuning methods across diverse complex manipulation tasks, simultaneously improving both offline evaluation performance and online sample efficiency.

Technology Category

Application Category

📝 Abstract

We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a 'chunked' action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.

Problem

Research questions and friction points this paper is trying to address.

Improving RL for long-horizon sparse-reward tasks

Enhancing offline-to-online RL sample efficiency

Mitigating exploration challenges with action chunking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Q-chunking improves RL with action chunking

Leverages offline data for online exploration

Uses unbiased n-step backups for TD learning

🔎 Similar Papers

No similar papers found.