Physics-Informed Reward Machines

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

In reinforcement learning, weak programmability, limited expressivity, and low sample efficiency persist under non-Markovian reward structures. To address these challenges, we propose the physics-informed Reward Machine (pRM). pRM explicitly encodes domain-specific physical priors as symbolic logic constraints into the reward machine’s architecture, enabling decoupled modeling of known environmental priors and unknown dynamics. It supports counterfactual experience generation and differentiable reward shaping, substantially enhancing both the programmability and expressive capacity of reward specifications. The framework is unified for both discrete and continuous physical environments. Experiments across multiple control benchmarks demonstrate that pRM significantly reduces sample complexity, accelerates policy convergence, and improves modeling fidelity for temporally dependent and history-sensitive non-Markovian rewards.

Technology Category

Application Category

📝 Abstract

Reward machines (RMs) provide a structured way to specify non-Markovian rewards in reinforcement learning (RL), thereby improving both expressiveness and programmability. Viewed more broadly, they separate what is known about the environment, captured by the reward mechanism, from what remains unknown and must be discovered through sampling. This separation supports techniques such as counterfactual experience generation and reward shaping, which reduce sample complexity and speed up learning. We introduce physics-informed reward machines (pRMs), a symbolic machine designed to express complex learning objectives and reward structures for RL agents, thereby enabling more programmable, expressive, and efficient learning. We present RL algorithms capable of exploiting pRMs via counterfactual experiences and reward shaping. Our experimental results show that these techniques accelerate reward acquisition during the training phases of RL. We demonstrate the expressiveness and effectiveness of pRMs through experiments in both finite and continuous physical environments, illustrating that incorporating pRMs significantly improves learning efficiency across several control tasks.

Problem

Research questions and friction points this paper is trying to address.

Incorporating physics knowledge into reward machines for reinforcement learning

Enabling more programmable and expressive reward structures

Accelerating reward acquisition and improving learning efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-informed reward machines for structured rewards

Counterfactual experiences and reward shaping techniques

Improved learning efficiency in control tasks

🔎 Similar Papers

Comprehensive Overview of Reward Engineering and Shaping in Advancing Reinforcement Learning Applications

2024-07-22IEEE AccessCitations: 1

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

2024-05-29arXiv.orgCitations: 0

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving

2024-04-122024 IEEE Intelligent Vehicles Symposium (IV)Citations: 8

Reward Machines for Deep RL in Noisy and Uncertain Environments

2024-05-31arXiv.orgCitations: 2