Expressive Reward Synthesis with the Runtime Monitoring Language

📅 2025-10-17

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

In reinforcement learning, conventional scalar reward functions suffer from limited expressivity and poor interpretability, failing to capture complex tasks involving non-Markovian dynamics, counting constraints, or parameterized conditions. To address this, we propose a language-based reward machine grounded in Runtime Monitoring Language (RML), overcoming the expressiveness bottleneck of classical reward machines—restricted to regular languages. Our approach integrates finite-state automata with an explicit memory mechanism, enabling structured, event-driven reward synthesis. This design supports precise modeling of non-regular and non-Markovian tasks, significantly enhancing both the flexibility and interpretability of reward specifications. Empirical evaluation across diverse challenging control benchmarks demonstrates consistent superiority over state-of-the-art baselines, validating the method’s high expressivity and practical efficacy.

Technology Category

Application Category

📝 Abstract

A key challenge in reinforcement learning (RL) is reward (mis)specification, whereby imprecisely defined reward functions can result in unintended, possibly harmful, behaviours. Indeed, reward functions in RL are typically treated as black-box mappings from state-action pairs to scalar values. While effective in many settings, this approach provides no information about why rewards are given, which can hinder learning and interpretability. Reward Machines address this issue by representing reward functions as finite state automata, enabling the specification of structured, non-Markovian reward functions. However, their expressivity is typically bounded by regular languages, leaving them unable to capture more complex behaviours such as counting or parametrised conditions. In this work, we build on the Runtime Monitoring Language (RML) to develop a novel class of language-based Reward Machines. By leveraging the built-in memory of RML, our approach can specify reward functions for non-regular, non-Markovian tasks. We demonstrate the expressiveness of our approach through experiments, highlighting additional advantages in flexible event-handling and task specification over existing Reward Machine-based methods.

Problem

Research questions and friction points this paper is trying to address.

Overcoming reward misspecification in reinforcement learning systems

Extending reward machines beyond regular language expressivity limits

Enabling specification of non-regular, non-Markovian reward functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Runtime Monitoring Language for reward synthesis

Leveraging built-in memory for non-regular tasks

Creating language-based Reward Machines with flexible event-handling

🔎 Similar Papers

No similar papers found.