Reward Shaping and Action Masking for Compositional Tasks using Behavior Trees and LLMs

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing approaches struggle to simultaneously achieve reactivity upon subtask failure and modular support for multiple objects in compositional tasks. This work proposes Masked Reward Behavior Trees (MRBT), which, for the first time, leverage behavior trees as verifiable symbolic structures to integrate large language model generation, SMT-based formal verification, and neuro-symbolic reinforcement learning. The resulting framework enables modular, transferable, and formally verifiable reward shaping and action masking mechanisms. Experimental results across five compositional tasks demonstrate that MRBT significantly outperforms baseline methods, achieving consistent improvements in both training efficiency and task success rates, thereby validating its effectiveness and practical utility.

📝 Abstract

Decomposing complex tasks into a sequence of simpler subtasks can improve learning efficiency for an autonomous agent. Reinforcement learning (RL) can be used to optimize agent policies to complete subtasks, but requires well-defined subtask rewards and benefits from action masking. Recent work uses large language models (LLMs) to automate reward shaping and action masking, however none of them fully address reactivity to subtask failure and modularity to varying objects for compositional tasks. To overcome these challenges, we develop masking reward behavior tree (MRBT), a symbolic structure used as a reactive and modular reward and action mask function. We design an MRBT template and derive logical specifications to construct and verify MRBTs for a sequence of object-interaction subtasks. Further, we develop an automated pipeline that uses an LLM to generate MRBTs robust to varying task objects, an SMT-solver to verify correctness of specifications, and a neurosymbolic RL loop to train agents on compositional tasks. Experiments demonstrate successful generation and refinement of five MRBTs, consistently improving training efficiency and task success rates over baselines and MRBTs without action masking. We further highlight three advantages of MRBTs: transferability, modularity, and verifiability.

Problem

Research questions and friction points this paper is trying to address.

reward shaping

action masking

compositional tasks

reactivity

modularity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward Shaping

Action Masking

Behavior Trees