BTGenBot-2: Efficient Behavior Tree Generation with Small Language Models

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key limitations of existing large language model (LLM)-driven robotic task planning approaches—namely, their closed-source nature, high computational overhead, and lack of standardized task representations—which hinder deployment in real-world systems. We propose an open-source, lightweight 1B-parameter LLM that directly maps natural language instructions and action primitives into executable XML-based behavior trees, enabling zero-shot generation and runtime error recovery. To our knowledge, this is the first approach to achieve lightweight, open-source, zero-shot behavior tree synthesis for robotics. We also introduce the first standardized benchmark for LLM-driven behavior trees, encompassing 52 navigation and manipulation tasks. Evaluated in NVIDIA Isaac Sim, our method achieves average success rates of 90.38% and 98.07% under zero-shot and one-shot settings, respectively, with up to a 16× speedup in inference latency, outperforming larger models such as GPT-5 and Claude Opus 4.1.

Technology Category

Application Category

📝 Abstract
Recent advances in robot learning increasingly rely on LLM-based task planning, leveraging their ability to bridge natural language with executable actions. While prior works showcased great performances, the widespread adoption of these models in robotics has been challenging as 1) existing methods are often closed-source or computationally intensive, neglecting the actual deployment on real-world physical systems, and 2) there is no universally accepted, plug-and-play representation for robotic task generation. Addressing these challenges, we propose BTGenBot-2, a 1B-parameter open-source small language model that directly converts natural language task descriptions and a list of robot action primitives into executable behavior trees in XML. Unlike prior approaches, BTGenBot-2 enables zero-shot BT generation, error recovery at inference and runtime, while remaining lightweight enough for resource-constrained robots. We further introduce the first standardized benchmark for LLM-based BT generation, covering 52 navigation and manipulation tasks in NVIDIA Isaac Sim. Extensive evaluations demonstrate that BTGenBot-2 consistently outperforms GPT-5, Claude Opus 4.1, and larger open-source models across both functional and non-functional metrics, achieving average success rates of 90.38% in zero-shot and 98.07% in one-shot, while delivering up to 16x faster inference compared to the previous BTGenBot.
Problem

Research questions and friction points this paper is trying to address.

robot task planning
behavior tree generation
small language models
real-world deployment
standardized representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Behavior Tree Generation
Small Language Model
Zero-shot Task Planning
Robotics Benchmark
Error Recovery
🔎 Similar Papers
No similar papers found.