LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language navigation (VLN) methods for UAVs focus on coarse-grained, long-range targets and fail to meet the precision requirements of low-altitude last-mile delivery. Method: This paper formally defines and addresses the fine-grained aerial last-mile delivery VLN task. We propose a modular multimodal large model architecture that integrates a lightweight large language model (LLM) with a vision-language model (VLM), enabling joint natural language understanding, floor-level localization, fine-grained object detection, and autonomous action decision-making. We further construct CARLA-based Vision-Language Delivery (VLD), the first benchmark dataset tailored for aerial last-mile delivery. Contribution/Results: Extensive end-to-end evaluation and ablation studies on VLD demonstrate significant improvements in navigation accuracy and environmental robustness over baseline approaches.

Technology Category

Application Category

📝 Abstract
The growing demand for intelligent logistics, particularly fine-grained terminal delivery, underscores the need for autonomous UAV (Unmanned Aerial Vehicle)-based delivery systems. However, most existing last-mile delivery studies rely on ground robots, while current UAV-based Vision-Language Navigation (VLN) tasks primarily focus on coarse-grained, long-range goals, making them unsuitable for precise terminal delivery. To bridge this gap, we propose LogisticsVLN, a scalable aerial delivery system built on multimodal large language models (MLLMs) for autonomous terminal delivery. LogisticsVLN integrates lightweight Large Language Models (LLMs) and Visual-Language Models (VLMs) in a modular pipeline for request understanding, floor localization, object detection, and action-decision making. To support research and evaluation in this new setting, we construct the Vision-Language Delivery (VLD) dataset within the CARLA simulator. Experimental results on the VLD dataset showcase the feasibility of the LogisticsVLN system. In addition, we conduct subtask-level evaluations of each module of our system, offering valuable insights for improving the robustness and real-world deployment of foundation model-based vision-language delivery systems.
Problem

Research questions and friction points this paper is trying to address.

Autonomous UAV delivery for fine-grained terminal logistics
Bridging gap in Vision-Language Navigation for precise delivery
Developing multimodal LLM system for low-altitude aerial tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MLLMs for autonomous terminal delivery
Integrates LLMs and VLMs in modular pipeline
Constructs VLD dataset in CARLA simulator
🔎 Similar Papers
No similar papers found.
X
Xinyuan Zhang
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
Yonglin Tian
Yonglin Tian
Institute of Automation, Chinese Academy of Sciences
Parallel intelligenceParallel umanned systemsIntelligent vehiclesAutonomous driving
Fei Lin
Fei Lin
Macau University of Science and Technology
Parallel IntelligenceLarge Language ModelEmbodied AgentAI4Science
Y
Yue Liu
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
J
Jing Ma
China Ship Research and Development Academy, Beijing 100101, China
K
Korn'elia S'ara Szatm'ary
Obuda University, Hungary
Fei-Yue Wang
Fei-Yue Wang
Professor, Formerly The University of Arizona, Currently Chinese Academy of Sciences
Intelligent SystemsIntelligent VehiclesRobotics and AutomationBlockchainDAO