LogisticsVLN: Vision-Language Navigation For Low-Altitude Terminal Delivery Based on Agentic UAVs

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing vision-language navigation (VLN) methods for UAVs focus on coarse-grained, long-range targets and fail to meet the precision requirements of low-altitude last-mile delivery. Method: This paper formally defines and addresses the fine-grained aerial last-mile delivery VLN task. We propose a modular multimodal large model architecture that integrates a lightweight large language model (LLM) with a vision-language model (VLM), enabling joint natural language understanding, floor-level localization, fine-grained object detection, and autonomous action decision-making. We further construct CARLA-based Vision-Language Delivery (VLD), the first benchmark dataset tailored for aerial last-mile delivery. Contribution/Results: Extensive end-to-end evaluation and ablation studies on VLD demonstrate significant improvements in navigation accuracy and environmental robustness over baseline approaches.

Technology Category

Application Category

📝 Abstract

The growing demand for intelligent logistics, particularly fine-grained terminal delivery, underscores the need for autonomous UAV (Unmanned Aerial Vehicle)-based delivery systems. However, most existing last-mile delivery studies rely on ground robots, while current UAV-based Vision-Language Navigation (VLN) tasks primarily focus on coarse-grained, long-range goals, making them unsuitable for precise terminal delivery. To bridge this gap, we propose LogisticsVLN, a scalable aerial delivery system built on multimodal large language models (MLLMs) for autonomous terminal delivery. LogisticsVLN integrates lightweight Large Language Models (LLMs) and Visual-Language Models (VLMs) in a modular pipeline for request understanding, floor localization, object detection, and action-decision making. To support research and evaluation in this new setting, we construct the Vision-Language Delivery (VLD) dataset within the CARLA simulator. Experimental results on the VLD dataset showcase the feasibility of the LogisticsVLN system. In addition, we conduct subtask-level evaluations of each module of our system, offering valuable insights for improving the robustness and real-world deployment of foundation model-based vision-language delivery systems.

Problem

Research questions and friction points this paper is trying to address.

Autonomous UAV delivery for fine-grained terminal logistics

Bridging gap in Vision-Language Navigation for precise delivery

Developing multimodal LLM system for low-altitude aerial tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MLLMs for autonomous terminal delivery

Integrates LLMs and VLMs in modular pipeline

Constructs VLD dataset in CARLA simulator

🔎 Similar Papers

No similar papers found.