Closed-Loop Verbal Reinforcement Learning for Task-Level Robotic Planning

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the challenge of enabling mobile robots to perform explainable and iteratively optimizable task-level planning in uncertain environments. The authors propose a closed-loop verbal reinforcement learning framework that integrates large language models with vision-language models to optimize symbolic policies—represented as behavior trees—through natural language feedback, without requiring gradient-based updates. This approach establishes, for the first time, a transparent and interpretable closed-loop learning mechanism at the symbolic planning level, supporting explicit causal reasoning and human-understandable policy evolution. Experiments on a physical mobile robot demonstrate that the method effectively enables adaptive recovery from task failures, interpretable policy refinement, and reliable real-world deployment.

Technology Category

Application Category

📝 Abstract

We propose a new Verbal Reinforcement Learning (VRL) framework for interpretable task-level planning in mobile robotic systems operating under execution uncertainty. The framework follows a closed-loop architecture that enables iterative policy improvement through interaction with the physical environment. In our framework, executable Behavior Trees are repeatedly refined by a Large Language Model actor using structured natural-language feedback produced by a Vision-Language Model critic that observes the physical robot and execution traces. Unlike conventional reinforcement learning, policy updates in VRL occur directly at the symbolic planning level, without gradient-based optimization. This enables transparent reasoning, explicit causal feedback, and human-interpretable policy evolution. We validate the proposed framework on a real mobile robot performing a multi-stage manipulation and navigation task under execution uncertainty. Experimental results show that the framework supports explainable policy improvements, closed-loop adaptation to execution failures, and reliable deployment on physical robotic systems.

Problem

Research questions and friction points this paper is trying to address.

Verbal Reinforcement Learning

Task-Level Planning

Execution Uncertainty

Interpretable Policy

Closed-Loop Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Verbal Reinforcement Learning

Behavior Trees

Large Language Model