Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

151K/year
🤖 AI Summary
This work addresses the inefficiency of the standard Decision Transformer, which embeds Return-to-Go (RTG) as standalone tokens in the autoregressive sequence, resulting in unnecessarily long sequences and high computational overhead. The authors propose decoupling the RTG conditioning from the autoregressive sequence and instead injecting it directly into the state representations. This enables the model to operate solely on compact (state, action) sequences, achieving a novel separation between sparse RTG signals and dense trajectory information. Evaluated on the D4RL benchmark, the proposed method significantly outperforms the standard Decision Transformer, attaining state-of-the-art performance while reducing sequence length by approximately one-third and substantially improving inference efficiency.
📝 Abstract
Decision Transformer (DT) formulates offline reinforcement learning as autoregressive sequence modeling, achieving promising results by predicting actions from a sequence of Return-to-Go (RTG), state, and action tokens. However, RTG is a scalar that summarizes future rewards, containing far less information than typical state or action vectors, yet it consumes the same computational budget per token. Worse, the self-attention cost of Transformers grows quadratically with sequence length, so including RTG as a separate token adds unnecessary overhead. We propose SlimDT, which removes RTG from the autoregressive sequence. Instead, we inject RTG information into the state representations before the sequential modeling step, allowing the Transformer to process only a compact (state, action) sequence. This reduces the sequence length by one-third, directly improving inference efficiency. On the D4RL benchmark, SlimDT surpasses standard DT across various tasks and achieves performance comparable to existing state-of-the-art methods. Decoupling a sparse conditioning signal from an information-rich sequence thus yields both computational gains and higher task performance.
Problem

Research questions and friction points this paper is trying to address.

Decision Transformer
Return-to-Go
sequence modeling
offline reinforcement learning
computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

SlimDT
Decision Transformer
Return-to-Go injection
sequence efficiency
offline reinforcement learning
🔎 Similar Papers
No similar papers found.