Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning (Extended Version)

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the challenge of specifying non-Markovian rewards in Markov decision processes with large or even infinite state spaces, where such rewards are difficult to define naturally and reusably. The authors propose a reward specification framework based on Linear Temporal Logic over finite traces with modalities (LTLfMT), which enables a unified expression of complex task objectives across heterogeneous data domains without requiring handcrafted predicates. By compiling LTLfMT formulas into reward machines and integrating a tailored hindsight experience replay (HER) mechanism, the approach automatically generates sparse reward signals. Experimental results demonstrate that the framework effectively supports logical specification of complex tasks in continuous control environments, and that the customized HER is crucial for tackling highly challenging sparse-reward problems.

Technology Category

Application Category

📝 Abstract

In this work, we propose a novel framework for the logical specification of non-Markovian rewards in Markov Decision Processes (MDPs) with large state spaces. Our approach leverages Linear Temporal Logic Modulo Theories over finite traces (LTLfMT), a more expressive extension of classical temporal logic in which predicates are first-order formulas of arbitrary first-order theories rather than simple Boolean variables. This enhanced expressiveness enables the specification of complex tasks over unstructured and heterogeneous data domains, promoting a unified and reusable framework that eliminates the need for manual predicate encoding. However, the increased expressive power of LTLfMT introduces additional theoretical and computational challenges compared to standard LTLf specifications. We address these challenges from a theoretical standpoint, identifying a fragment of LTLfMT that is tractable but sufficiently expressive for reward specification in an infinite-state-space context. From a practical perspective, we introduce a method based on reward machines and Hindsight Experience Replay (HER) to translate first-order logic specifications and address reward sparsity. We evaluate this approach to a continuous-control setting using Non-Linear Arithmetic Theory, showing that it enables natural specification of complex tasks. Experimental results show how a tailored implementation of HER is fundamental in solving tasks with complex goals.

Problem

Research questions and friction points this paper is trying to address.

non-Markovian rewards

reward specification

first-order temporal logic

Markov Decision Processes

heterogeneous data domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

LTLfMT

First-Order Temporal Logic

Reward Specification

Hindsight Experience Replay

Non-Markovian Rewards

🔎 Similar Papers

On Generating Explanations for Reinforcement Learning Policies: An Empirical Study

2023-09-29IEEE Control Systems LettersCitations: 0

Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning

2024-02-27International Conference on Machine LearningCitations: 1

Directed Exploration in Reinforcement Learning from Linear Temporal Logic

2024-08-18arXiv.orgCitations: 1