E$^2$DT: Efficient and Effective Decision Transformer with Experience-Aware Sampling for Robotic Manipulation

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the limitations of standard Decision Transformers in robotic manipulation, which suffer from low sample efficiency, insufficient exploration, and suboptimal performance due to reliance on uniform experience replay. To overcome these issues, the authors propose the E²DT framework, which innovatively integrates k-Determinantal Point Processes (k-DPPs) with the Decision Transformer to enable experience-aware sampling through a joint quality-diversity kernel. Trajectory diversity is quantified via latent embeddings, while trajectory quality is assessed by combining return-to-go (RTG) quantiles, predictive uncertainty, and stage coverage. Experimental results demonstrate that E²DT significantly outperforms existing methods in both simulated and real-world robotic tasks, markedly improving sample efficiency and robustness in long-horizon reinforcement learning settings.

📝 Abstract

In reinforcement learning (RL) for robotic manipulation, the Decision Transformer (DT) has emerged as an effective framework for addressing long-horizon tasks. However, DT's performance depends heavily on the coverage of collected experiences. Without an active exploration mechanism, standard DT relies on uniform replay, which leads to poor sample efficiency, limited exploration, and reduced overall effectiveness. At the same time, while excessive exploration can help avoid local optima, it often delays policy convergence and leads to degraded efficiency. To address these limitations, we propose E$^2$DT, a DT-guided k-Determinantal Point Process sampling framework that enables the model to actively shape its own experience selection. Our framework is experience-aware, allowing E$^2$DT to be both efficient, by prioritizing sampling quality, such as high-return, high-uncertainty, and underrepresented trajectories, and effective, by ensuring diversity across trajectory windows to preserve policy optimality. Specifically, DT's internal latent embeddings measure diversity across trajectory windows, while quality is quantified through a composite metric that integrates return-to-go (RTG) quantiles, predictive uncertainty, and stage coverage based on inverse frequency. These two dimensions are integrated into a novel quality-diversity joint kernel that prioritizes the most informative experiences, thereby enabling learning that is both efficient and effective. We evaluate E$^2$DT on challenging robotic manipulation benchmarks in both simulation and real-robot settings. Results show that it consistently outperforms prior methods. These findings demonstrate that coupling policy learning with experience-aware sampling provides a principled path toward robust long-horizon robotic learning.

Problem

Research questions and friction points this paper is trying to address.

Decision Transformer

robotic manipulation

experience sampling

sample efficiency

exploration-exploitation trade-off

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decision Transformer

experience-aware sampling

k-Determinantal Point Process