🤖 AI Summary
Existing offline safe reinforcement learning methods struggle with complex, multi-threaded, and temporally dependent real-world constraints. This paper proposes STL-Decision Transformer, the first approach to explicitly incorporate Signal Temporal Logic (STL) specifications as conditional inputs into the Decision Transformer architecture, enabling joint optimization of reward maximization and satisfaction of multi-granularity temporal safety constraints. By directly encoding STL semantics into the policy conditioning mechanism, our method overcomes fundamental limitations of conventional conditional policies in expressivity and generalizability for temporal logic, while supporting continuous, controllable adjustment of STL satisfaction degrees. Evaluated on the DSRL benchmark, STL-Decision Transformer significantly outperforms state-of-the-art methods, achieving simultaneous improvements in both cumulative reward and constraint satisfaction rate. These results demonstrate its effectiveness and robustness for offline, constraint-driven policy learning under rich temporal safety requirements.
📝 Abstract
Offline safe reinforcement learning (RL) aims to train a constraint satisfaction policy from a fixed dataset. Current state-of-the-art approaches are based on supervised learning with a conditioned policy. However, these approaches fall short in real-world applications that involve complex tasks with rich temporal and logical structures. In this paper, we propose temporal logic Specification-conditioned Decision Transformer (SDT), a novel framework that harnesses the expressive power of signal temporal logic (STL) to specify complex temporal rules that an agent should follow and the sequential modeling capability of Decision Transformer (DT). Empirical evaluations on the DSRL benchmarks demonstrate the better capacity of SDT in learning safe and high-reward policies compared with existing approaches. In addition, SDT shows good alignment with respect to different desired degrees of satisfaction of the STL specification that it is conditioned on.