Closed-Loop Verbal Reinforcement Learning for Task-Level Robotic Planning

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling mobile robots to perform explainable and iteratively optimizable task-level planning in uncertain environments. The authors propose a closed-loop verbal reinforcement learning framework that integrates large language models with vision-language models to optimize symbolic policies—represented as behavior trees—through natural language feedback, without requiring gradient-based updates. This approach establishes, for the first time, a transparent and interpretable closed-loop learning mechanism at the symbolic planning level, supporting explicit causal reasoning and human-understandable policy evolution. Experiments on a physical mobile robot demonstrate that the method effectively enables adaptive recovery from task failures, interpretable policy refinement, and reliable real-world deployment.

Technology Category

Application Category

📝 Abstract
We propose a new Verbal Reinforcement Learning (VRL) framework for interpretable task-level planning in mobile robotic systems operating under execution uncertainty. The framework follows a closed-loop architecture that enables iterative policy improvement through interaction with the physical environment. In our framework, executable Behavior Trees are repeatedly refined by a Large Language Model actor using structured natural-language feedback produced by a Vision-Language Model critic that observes the physical robot and execution traces. Unlike conventional reinforcement learning, policy updates in VRL occur directly at the symbolic planning level, without gradient-based optimization. This enables transparent reasoning, explicit causal feedback, and human-interpretable policy evolution. We validate the proposed framework on a real mobile robot performing a multi-stage manipulation and navigation task under execution uncertainty. Experimental results show that the framework supports explainable policy improvements, closed-loop adaptation to execution failures, and reliable deployment on physical robotic systems.
Problem

Research questions and friction points this paper is trying to address.

Verbal Reinforcement Learning
Task-Level Planning
Execution Uncertainty
Interpretable Policy
Closed-Loop Adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Verbal Reinforcement Learning
Behavior Trees
Large Language Model
Vision-Language Model
Interpretable Planning
D
Dmitrii Plotnikov
Intelligent Space Robotics Laboratory, Skolkovo Institute of Science and Technology
I
Iaroslav Kolomiets
Intelligent Space Robotics Laboratory, Skolkovo Institute of Science and Technology
D
Dmitrii Maliukov
Intelligent Space Robotics Laboratory, Skolkovo Institute of Science and Technology
D
Dmitrij Kosenkov
Intelligent Space Robotics Laboratory, Skolkovo Institute of Science and Technology
D
Daniia Zinniatullina
Intelligent Space Robotics Laboratory, Skolkovo Institute of Science and Technology
A
Artem Trandofilov
Intelligent Space Robotics Laboratory, Skolkovo Institute of Science and Technology
G
Georgii Gazaryan
Intelligent Space Robotics Laboratory, Skolkovo Institute of Science and Technology
K
Kirill Bogatikov
Intelligent Space Robotics Laboratory, Skolkovo Institute of Science and Technology
T
Timofei Kozlov
Intelligent Space Robotics Laboratory, Skolkovo Institute of Science and Technology
I
Igor Duchinskii
Intelligent Space Robotics Laboratory, Skolkovo Institute of Science and Technology
Mikhail Konenkov
Mikhail Konenkov
Skolkovo institute of science and technology
RoboticsAIVLMLLMVR
Miguel Altamirano Cabrera
Miguel Altamirano Cabrera
Research Scientist, Skolkovo Institute of Science and Technology
HapticsRoboticsTactile SensationComputer Vision
Dzmitry Tsetserukou
Dzmitry Tsetserukou
Associate Professor, Skolkovo Institute of Science and Technology (Skoltech)
RoboticsHapticsUAV SwarmAIVR