🤖 AI Summary
Heterogeneous navigation coordination between unmanned aerial vehicles (UAVs) and automated guided vehicles (AGVs) in dynamic warehouse environments is challenged by UAV energy constraints, payload limitations, and stringent safety-critical collision avoidance requirements under complex conditions.
Method: This paper proposes a vision-language model–retrieval-augmented generation (VLM-RAG)–driven semantic impedance coordination framework. It integrates VLM-based environmental semantic understanding with RAG-enhanced dynamic parameter generation for impedance control; further incorporates virtual impedance linkages and adaptive topology reconfiguration to enable multi-robot semantic collaboration.
Contribution/Results: Unlike conventional artificial potential field (APF) or fixed-impedance approaches, the framework enables real-time, multimodal perception–guided response and cooperative obstacle avoidance. Evaluated across 12 realistic warehouse scenarios, it achieves a 92% task success rate. Under ideal illumination, VLM-RAG improves object recognition and control-parameter matching accuracy by 8%, while ground robots maintain stable safe following and dynamic collision avoidance performance.
📝 Abstract
With the growing demand for efficient logistics, unmanned aerial vehicles (UAVs) are increasingly being paired with automated guided vehicles (AGVs). While UAVs offer the ability to navigate through dense environments and varying altitudes, they are limited by battery life, payload capacity, and flight duration, necessitating coordinated ground support.
Focusing on heterogeneous navigation, SwarmVLM addresses these limitations by enabling semantic collaboration between UAVs and ground robots through impedance control. The system leverages the Vision Language Model (VLM) and the Retrieval-Augmented Generation (RAG) to adjust impedance control parameters in response to environmental changes. In this framework, the UAV acts as a leader using Artificial Potential Field (APF) planning for real-time navigation, while the ground robot follows via virtual impedance links with adaptive link topology to avoid collisions with short obstacles.
The system demonstrated a 92% success rate across 12 real-world trials. Under optimal lighting conditions, the VLM-RAG framework achieved 8% accuracy in object detection and selection of impedance parameters. The mobile robot prioritized short obstacle avoidance, occasionally resulting in a lateral deviation of up to 50 cm from the UAV path, which showcases safe navigation in a cluttered setting.