🤖 AI Summary
This work addresses the input redundancy inherent in the Decision Transformer (DT) when utilizing Return-to-Go (RTG) sequences, which compromises both computational efficiency and performance. The authors propose the Decoupled Decision Transformer (DDT), which is the first to explicitly identify the redundancy in RTG sequences and decouple the RTG conditioning mechanism: only the most recent RTG value is used to guide action prediction, while the Transformer backbone processes solely the observation and action sequences. This streamlined architecture reduces unnecessary computation, enhances inference efficiency, and achieves significant performance improvements over the original DT across multiple offline reinforcement learning benchmarks, matching or surpassing the performance of current state-of-the-art DT variants.
📝 Abstract
The Decision Transformer (DT) has established a powerful sequence modeling approach to offline reinforcement learning. It conditions its action predictions on Return-to-Go (RTG), using it both to distinguish trajectory quality during training and to guide action generation at inference. In this work, we identify a critical redundancy in this design: feeding the entire sequence of RTGs into the Transformer is theoretically unnecessary, as only the most recent RTG affects action prediction. We show that this redundancy can impair DT's performance through experiments. To resolve this, we propose the Decoupled DT (DDT). DDT simplifies the architecture by processing only observation and action sequences through the Transformer, using the latest RTG to guide the action prediction. This streamlined approach not only improves performance but also reduces computational cost. Our experiments show that DDT significantly outperforms DT and establishes competitive performance against state-of-the-art DT variants across multiple offline RL tasks.