FlyCo: Foundation Model-Empowered Drones for Autonomous 3D Structure Scanning in Open-World Environments

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling autonomous 3D scanning of arbitrary targets by drones in open-world environments, a task still hindered by strong assumptions and reliance on human priors. The authors propose FlyCo, a system that, for the first time, deeply integrates vision-language foundation models into a closed-loop perception-prediction-planning pipeline. FlyCo achieves fully automatic and adaptive scanning using only low-effort textual or visual prompts. By leveraging multimodal fusion, it enables zero-shot target understanding and geometric completion, while generating flight trajectories that jointly optimize coverage, efficiency, and real-time collision avoidance. Experiments in both real-world and simulated settings demonstrate that FlyCo significantly outperforms existing approaches, achieving breakthroughs in accuracy, efficiency, and safety, substantially reducing human intervention, and featuring an extensible architecture compatible with future foundation models.

Technology Category

Application Category

📝 Abstract
Autonomous 3D scanning of open-world target structures via drones remains challenging despite broad applications. Existing paradigms rely on restrictive assumptions or effortful human priors, limiting practicality, efficiency, and adaptability. Recent foundation models (FMs) offer great potential to bridge this gap. This paper investigates a critical research problem: What system architecture can effectively integrate FM knowledge for this task? We answer it with FlyCo, a principled FM-empowered perception-prediction-planning loop enabling fully autonomous, prompt-driven 3D target scanning in diverse unknown open-world environments. FlyCo directly translates low-effort human prompts (text, visual annotations) into precise adaptive scanning flights via three coordinated stages: (1) perception fuses streaming sensor data with vision-language FMs for robust target grounding and tracking; (2) prediction distills FM knowledge and combines multi-modal cues to infer the partially observed target's complete geometry; (3) planning leverages predictive foresight to generate efficient and safe paths with comprehensive target coverage. Building on this, we further design key components to boost open-world target grounding efficiency and robustness, enhance prediction quality in terms of shape accuracy, zero-shot generalization, and temporal stability, and balance long-horizon flight efficiency with real-time computability and online collision avoidance. Extensive challenging real-world and simulation experiments show FlyCo delivers precise scene understanding, high efficiency, and real-time safety, outperforming existing paradigms with lower human effort and verifying the proposed architecture's practicality. Comprehensive ablations validate each component's contribution. FlyCo also serves as a flexible, extensible blueprint, readily leveraging future FM and robotics advances. Code will be released.
Problem

Research questions and friction points this paper is trying to address.

autonomous 3D scanning
open-world environments
drone perception
foundation models
target structure reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Foundation Models
Autonomous Drone
3D Structure Scanning
Perception-Prediction-Planning Loop
Open-World Robotics
🔎 Similar Papers
No similar papers found.
Chen Feng
Chen Feng
Ph.D. candidate, UAV Group, ECE, HKUST
Robotics
G
Guiyong Zheng
School of Artificial Intelligence, Sun Yat-sen University, Zhuhai, China
T
Tengkai Zhuang
Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen, China
Y
Yongqian Wu
Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen, China
F
Fangzhan He
Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen, China
Haojia Li
Haojia Li
The Hong Kong University of Science and Technology
Autonomous Navigation
J
Juepeng Zheng
School of Artificial Intelligence, Sun Yat-sen University, Zhuhai, China
Shaojie Shen
Shaojie Shen
Associate Professor, Hong Kong University of Science and Technology
Robotics
Boyu Zhou
Boyu Zhou
Assistant Professor, SUSTech
Roboticsaerial robotsactive perceptionmobile manipulation