Active Reward Machine Inference From Raw State Trajectories

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work addresses the challenge of inferring implicit reward machine structures from raw state trajectories under conditions of complete absence of reward signals, state labels, and observations of reward machine nodes. To tackle this problem, the authors propose a novel approach that integrates trajectory analysis, automaton inference, and active learning. By leveraging policy-informed, incremental trajectory queries, the method efficiently reconstructs both the reward machine and its labeling function. Notably, this is the first technique to achieve unsupervised reward machine inference, demonstrating high accuracy in recovering target reward machines across multiple grid-world tasks. The results validate its effectiveness and data efficiency even in scenarios with extreme information scarcity.

Technology Category

Application Category

📝 Abstract

Reward machines are automaton-like structures that capture the memory required to accomplish a multi-stage task. When combined with reinforcement learning or optimal control methods, they can be used to synthesize robot policies to achieve such tasks. However, specifying a reward machine by hand, including a labeling function capturing high-level features that the decisions are based on, can be a daunting task. This paper deals with the problem of learning reward machines directly from raw state and policy information. As opposed to existing works, we assume no access to observations of rewards, labels, or machine nodes, and show what trajectory data is sufficient for learning the reward machine in this information-scarce regime. We then extend the result to an active learning setting where we incrementally query trajectory extensions to improve data (and indirectly computational) efficiency. Results are demonstrated with several grid world examples.

Problem

Research questions and friction points this paper is trying to address.

reward machine

reinforcement learning

active learning

trajectory data

automaton inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

reward machine

active learning

trajectory inference