Active Reward Machine Inference From Raw State Trajectories

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of inferring implicit reward machine structures from raw state trajectories under conditions of complete absence of reward signals, state labels, and observations of reward machine nodes. To tackle this problem, the authors propose a novel approach that integrates trajectory analysis, automaton inference, and active learning. By leveraging policy-informed, incremental trajectory queries, the method efficiently reconstructs both the reward machine and its labeling function. Notably, this is the first technique to achieve unsupervised reward machine inference, demonstrating high accuracy in recovering target reward machines across multiple grid-world tasks. The results validate its effectiveness and data efficiency even in scenarios with extreme information scarcity.
📝 Abstract
Reward machines are automaton-like structures that capture the memory required to accomplish a multi-stage task. When combined with reinforcement learning or optimal control methods, they can be used to synthesize robot policies to achieve such tasks. However, specifying a reward machine by hand, including a labeling function capturing high-level features that the decisions are based on, can be a daunting task. This paper deals with the problem of learning reward machines directly from raw state and policy information. As opposed to existing works, we assume no access to observations of rewards, labels, or machine nodes, and show what trajectory data is sufficient for learning the reward machine in this information-scarce regime. We then extend the result to an active learning setting where we incrementally query trajectory extensions to improve data (and indirectly computational) efficiency. Results are demonstrated with several grid world examples.
Problem

Research questions and friction points this paper is trying to address.

reward machine
reinforcement learning
active learning
trajectory data
automaton inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

reward machine
active learning
trajectory inference
reinforcement learning
automaton learning
🔎 Similar Papers
No similar papers found.