Reward Machines for Deep RL in Noisy and Uncertain Environments

📅 2024-05-31
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) often fails under partial observability, sensor noise, and ambiguous reward semantics—e.g., domain-specific terminology with multiple interpretations—leading to unreliable reward parsing and suboptimal policies. Method: This paper introduces the first systematic modeling of reward machines as partially observable Markov decision processes (POMDPs) to formalize how semantic uncertainty under fuzzy observations affects reward interpretation. We propose a robust RL framework integrating task-structure priors, comprising structured reward decomposition, uncertainty-aware state abstraction, and semantics-guided policy optimization. Contribution/Results: We theoretically identify the root cause of conventional DRL’s failure under semantic ambiguity and prove that our method guarantees policy safety and improved sample efficiency. Empirical evaluation across multiple uncertain navigation and control benchmarks demonstrates substantial gains in policy robustness and generalization. Moreover, the structured task priors exhibit strong cross-task transferability.

Technology Category

Application Category

📝 Abstract
Reward Machines provide an automaton-inspired structure for specifying instructions, safety constraints, and other temporally extended reward-worthy behaviour. By exposing the underlying structure of a reward function, they enable the decomposition of an RL task, leading to impressive gains in sample efficiency. Although Reward Machines and similar formal specifications have a rich history of application towards sequential decision-making problems, they critically rely on a ground-truth interpretation of the domain-specific vocabulary that forms the building blocks of the reward function--such ground-truth interpretations are elusive in the real world due in part to partial observability and noisy sensing. In this work, we explore the use of Reward Machines for Deep RL in noisy and uncertain environments. We characterize this problem as a POMDP and propose a suite of RL algorithms that exploit task structure under uncertain interpretation of the domain-specific vocabulary. Through theory and experiments, we expose pitfalls in naive approaches to this problem while simultaneously demonstrating how task structure can be successfully leveraged under noisy interpretations of the vocabulary.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Deep Learning
Uncertain Environment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task Structure Exploitation
Reward Machine Integration
Noise Resilience in Learning
🔎 Similar Papers
No similar papers found.
A
Andrew C. Li
University of Toronto, Vector Institute
Z
Zizhao Chen
Cornell University
Toryn Q. Klassen
Toryn Q. Klassen
University of Toronto, Vector Institute, Schwartz Reisman Institute for Technology and Society
Pashootan Vaezipoor
Pashootan Vaezipoor
Georgian.io, Vector Institute
Rodrigo Toro Icarte
Rodrigo Toro Icarte
Professor of Computer Science, Pontificia Universidad Católica de Chile
Artificial IntelligenceReinforcement Learning
S
Sheila A. McIlraith
University of Toronto, Vector Institute, Schwartz Reisman Institute for Technology and Society