RAMP: Hybrid DRL for Online Learning of Numeric Action Models

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the limitation of existing numeric planning approaches, which rely on offline-learned action models that are difficult to acquire through online environment interaction. To overcome this, we propose RAMP, a novel framework that, for the first time, enables online learning of numeric action models by simultaneously training a policy and an action model via deep reinforcement learning and inductive model learning, coupled with model-based planning in a positive feedback loop. To facilitate tight integration between reinforcement learning and automated planning, we introduce Numeric PDDLGym, a general-purpose environment transition framework. Empirical evaluation on standard IPC numeric domains demonstrates that RAMP significantly outperforms canonical deep reinforcement learning algorithms such as PPO in both problem solvability and plan quality.

Technology Category

Application Category

📝 Abstract

Automated planning algorithms require an action model specifying the preconditions and effects of each action, but obtaining such a model is often hard. Learning action models from observations is feasible, but existing algorithms for numeric domains are offline, requiring expert traces as input. We propose the Reinforcement learning, Action Model learning, and Planning (RAMP) strategy for learning numeric planning action models online via interactions with the environment. RAMP simultaneously trains a Deep Reinforcement Learning (DRL) policy, learns a numeric action model from past interactions, and uses that model to plan future actions when possible. These components form a positive feedback loop: the RL policy gathers data to refine the action model, while the planner generates plans to continue training the RL policy. To facilitate this integration of RL and numeric planning, we developed Numeric PDDLGym, an automated framework for converting numeric planning problems to Gym environments. Experimental results on standard IPC numeric domains show that RAMP significantly outperforms PPO, a well-known DRL algorithm, in terms of solvability and plan quality.

Problem

Research questions and friction points this paper is trying to address.

action model learning

numeric planning

online learning

automated planning

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

online action model learning

hybrid deep reinforcement learning

numeric planning