CommandSwarm: Safety-Aware Natural Language-to-Behavior-Tree Generation for Robotic Swarms

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge of safely and reliably translating ambiguous natural language instructions from non-expert users into executable behavior trees for swarm robotics. The authors propose a safety-aware language-to-behavior-tree generation pipeline that, for the first time, integrates instruction-level safety filtering and deterministic parsing validation into the generation process. By combining multilingual translation, constrained prompting, 4-bit quantized large language models (Falcon3-Instruct-10B and Mistral-7B-v3), and LoRA fine-tuning, the approach achieves efficient domain adaptation. Experiments show that LoRA fine-tuning boosts Falcon3-Instruct-10B’s zero-shot performance to a BLEU score of 0.663 and ROUGE-L of 0.692, while increasing syntactic validity from 0% to 72%. System-level validation further demonstrates that high generation quality alone is insufficient for deployment safety, necessitating the integration of parsing and safety gating mechanisms.

📝 Abstract

Natural-language interfaces can make swarm robotics more accessible to non-expert operators, but they must translate ambiguous user intent into executable swarm behaviors without unsupported actions, malformed programs, or unsafe plans. This paper presents CommandSwarm, a safety-aware language-to-behavior-tree pipeline for generating XML behavior trees (BTs) from speech or text commands. The system combines multilingual translation, command-level safety filtering, constrained prompting, a LoRA-adapted large language model (LLM), and deterministic parser validation against a whitelist of executable swarm primitives. We evaluate eleven open 6.7B--14B parameter LLMs, all using 4-bit quantization, on representative swarm-control scenarios under zero-shot, one-shot, and two-shot prompting. Falcon3-Instruct-10B and Mistral-7B-v3 are the strongest prompt-engineered candidates, reaching BLEU scores above 0.60 and high syntactic validity in few-shot settings. LoRA adaptation of Falcon3-Instruct-10B on a 2,063-example synthetic instruction--BT corpus improves zero-shot BLEU from 0.267 to 0.663, ROUGE-L from 0.366 to 0.692, and parser-accepted syntactic validity from 0% to 72%. Translation experiments further show that SeamlessM4T v2-large and EuroLLM-9B provide the best quality-latency trade-offs for the multilingual front end. The results indicate that compact, quantized, domain-adapted LLMs can generate useful swarm BTs when embedded in a validated systems pipeline. They also show that parser acceptance and safety filtering remain necessary execution gates; generation quality alone is not sufficient for autonomous deployment.

Problem

Research questions and friction points this paper is trying to address.

natural language-to-behavior-tree

robotic swarms

safety-aware generation

executable swarm behaviors

ambiguous user intent

Innovation

Methods, ideas, or system contributions that make the work stand out.

Behavior Trees

Safety-Aware Generation

LoRA-Adapted LLM