$ exttt{FORM}$: Learning Expressive and Transferable First-Order Logic Reward Machines

📅 2024-12-31

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing reward machines (RMs) suffer from limited expressivity, difficulty modeling non-Markovian task logic, and poor cross-task transferability. Method: We propose First-Order Reward Machines (FORM), the first RM framework to encode edge labels using first-order logic—enabling compact, symbolic representation of higher-order task dependencies and quantified constraints—and introduce a multi-agent collaborative learning framework that jointly optimizes policies under a shared FORM. Contributions/Results: Experiments demonstrate that FORM successfully captures reward structures inexpressible by conventional RMs; achieves significantly faster training convergence; improves cross-task transfer success rate by over 40%; and drastically reduces state-machine size. This work establishes a new paradigm for interpretable and transferable reward modeling in reinforcement learning.

Technology Category

Application Category

📝 Abstract

Reward machines (RMs) are an effective approach for addressing non-Markovian rewards in reinforcement learning (RL) through finite-state machines. Traditional RMs, which label edges with propositional logic formulae, inherit the limited expressivity of propositional logic. This limitation hinders the learnability and transferability of RMs since complex tasks will require numerous states and edges. To overcome these challenges, we propose First-Order Reward Machines ($ exttt{FORM}$s), which use first-order logic to label edges, resulting in more compact and transferable RMs. We introduce a novel method for $ extbf{learning}$ $ exttt{FORM}$s and a multi-agent formulation for $ extbf{exploiting}$ them and facilitate their transferability, where multiple agents collaboratively learn policies for a shared $ exttt{FORM}$. Our experimental results demonstrate the scalability of $ exttt{FORM}$s with respect to traditional RMs. Specifically, we show that $ exttt{FORM}$s can be effectively learnt for tasks where traditional RM learning approaches fail. We also show significant improvements in learning speed and task transferability thanks to the multi-agent learning framework and the abstraction provided by the first-order language.

Problem

Research questions and friction points this paper is trying to address.

Complex Task Handling

Efficiency in Learning

Knowledge Transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Advanced Logic Reward Machines

Efficient Task Handling

Knowledge Sharing

🔎 Similar Papers

No similar papers found.