🤖 AI Summary
This work addresses the challenge of specifying non-Markovian rewards in Markov decision processes with large or even infinite state spaces, where such rewards are difficult to define naturally and reusably. The authors propose a reward specification framework based on Linear Temporal Logic over finite traces with modalities (LTLfMT), which enables a unified expression of complex task objectives across heterogeneous data domains without requiring handcrafted predicates. By compiling LTLfMT formulas into reward machines and integrating a tailored hindsight experience replay (HER) mechanism, the approach automatically generates sparse reward signals. Experimental results demonstrate that the framework effectively supports logical specification of complex tasks in continuous control environments, and that the customized HER is crucial for tackling highly challenging sparse-reward problems.
📝 Abstract
In this work, we propose a novel framework for the logical specification of non-Markovian rewards in Markov Decision Processes (MDPs) with large state spaces. Our approach leverages Linear Temporal Logic Modulo Theories over finite traces (LTLfMT), a more expressive extension of classical temporal logic in which predicates are first-order formulas of arbitrary first-order theories rather than simple Boolean variables. This enhanced expressiveness enables the specification of complex tasks over unstructured and heterogeneous data domains, promoting a unified and reusable framework that eliminates the need for manual predicate encoding. However, the increased expressive power of LTLfMT introduces additional theoretical and computational challenges compared to standard LTLf specifications. We address these challenges from a theoretical standpoint, identifying a fragment of LTLfMT that is tractable but sufficiently expressive for reward specification in an infinite-state-space context. From a practical perspective, we introduce a method based on reward machines and Hindsight Experience Replay (HER) to translate first-order logic specifications and address reward sparsity. We evaluate this approach to a continuous-control setting using Non-Linear Arithmetic Theory, showing that it enables natural specification of complex tasks. Experimental results show how a tailored implementation of HER is fundamental in solving tasks with complex goals.