Learning Robust Reward Machines from Noisy Labels

πŸ“… 2024-08-27
πŸ›οΈ International Conference on Principles of Knowledge Representation and Reasoning
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Learning robust reward machines (RMs) from noisy execution traces remains challenging in reinforcement learning. Method: We propose a closed-loop co-learning framework featuring: (1) Bayesian posterior belief modeling to explicitly quantify trajectory uncertainty and noise tolerance; (2) an alternating online update mechanism jointly optimizing the RM and policy; (3) the first posterior-belief-based probabilistic reward shaping, enabling stable extraction of transferable RMs under high noise; and (4) integration of inductive logic programming (ILP), finite-state machine modeling, and online RM relearning. Results: Experiments show that the learned RMs closely approximate ground-truth structures under noise, and agents guided by them achieve performance on par with those using handcrafted RM baselines. The approach demonstrates significant improvements in robustness, transferability, and practical applicability.

Technology Category

Application Category

πŸ“ Abstract
This paper presents PROB-IRM, an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces. The key aspect of RM-driven RL is the exploitation of a finite-state ma- chine that decomposes the agent’s task into different sub- tasks. PROB-IRM uses a state-of-the-art inductive logic pro- gramming framework robust to noisy examples to learn RMs from noisy traces using the Bayesian posterior degree of be- liefs, thus ensuring robustness against inconsistencies. Piv- otal for the results is the interleaving between RM learning and policy learning: a new RM is learned whenever the RL agent generates a trace that is believed not to be accepted by the current RM. To speed up the training of the RL agent, PROB-IRM employs a probabilistic formulation of reward shaping that uses the posterior Bayesian beliefs derived from the traces. Our experimental analysis shows that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks success- fully. Despite the complexity of learning the RM from noisy traces, agents trained with PROB-IRM perform comparably to agents provided with handcrafted RMs.
Problem

Research questions and friction points this paper is trying to address.

Learning robust reward machines from noisy traces
Interleaving reward machine and policy learning
Ensuring robustness against noisy label inconsistencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Bayesian posterior for noisy label robustness
Interleaves reward machine and policy learning dynamically
Employs probabilistic reward shaping for faster training
πŸ”Ž Similar Papers
No similar papers found.