π€ AI Summary
To address the challenge of autonomous robotic operation in large-scale, unstructured outdoor environments, this paper introduces SPINEβthe first lightweight, multimodal autonomous navigation framework supporting kilometer-scale deployment. To tackle open-world challenges including active exploration, complex-terrain navigation, anomaly detection and response, and severe edge-computing resource constraints, SPINE adopts an LLM-agnostic architecture enabling end-to-end, LLM-driven task planning; proposes a language model distillation method tailored for SWaP-constrained platforms, substantially reducing inference overhead; and implements a natural-language-driven, on-device UAV planning system, validated over multi-kilometer field trials in real-world wilderness settings. Key contributions are: (1) the first demonstration of LLM-augmented long-range autonomous navigation inιε€ environments; (2) the open-source release of the first on-device, language-model-driven UAV natural-language planning system; and (3) empirical validation of foundation model feasibility and robustness on resource-constrained robotic systems.
π Abstract
The integration of foundation models (FMs) into robotics has enabled robots to understand natural language and reason about the semantics in their environments. However, existing FM-enabled robots primary operate in closed-world settings, where the robot is given a full prior map or has a full view of its workspace. This paper addresses the deployment of FM-enabled robots in the field, where missions often require a robot to operate in large-scale and unstructured environments. To effectively accomplish these missions, robots must actively explore their environments, navigate obstacle-cluttered terrain, handle unexpected sensor inputs, and operate with compute constraints. We discuss recent deployments of SPINE, our LLM-enabled autonomy framework, in field robotic settings. To the best of our knowledge, we present the first demonstration of large-scale LLM-enabled robot planning in unstructured environments with several kilometers of missions. SPINE is agnostic to a particular LLM, which allows us to distill small language models capable of running onboard size, weight and power (SWaP) limited platforms. Via preliminary model distillation work, we then present the first language-driven UAV planner using on-device language models. We conclude our paper by proposing several promising directions for future research.