FlyCo: Foundation Model-Empowered Drones for Autonomous 3D Structure Scanning in Open-World Environments

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the challenge of enabling autonomous 3D scanning of arbitrary targets by drones in open-world environments, a task still hindered by strong assumptions and reliance on human priors. The authors propose FlyCo, a system that, for the first time, deeply integrates vision-language foundation models into a closed-loop perception-prediction-planning pipeline. FlyCo achieves fully automatic and adaptive scanning using only low-effort textual or visual prompts. By leveraging multimodal fusion, it enables zero-shot target understanding and geometric completion, while generating flight trajectories that jointly optimize coverage, efficiency, and real-time collision avoidance. Experiments in both real-world and simulated settings demonstrate that FlyCo significantly outperforms existing approaches, achieving breakthroughs in accuracy, efficiency, and safety, substantially reducing human intervention, and featuring an extensible architecture compatible with future foundation models.

Technology Category

Application Category

📝 Abstract

Autonomous 3D scanning of open-world target structures via drones remains challenging despite broad applications. Existing paradigms rely on restrictive assumptions or effortful human priors, limiting practicality, efficiency, and adaptability. Recent foundation models (FMs) offer great potential to bridge this gap. This paper investigates a critical research problem: What system architecture can effectively integrate FM knowledge for this task? We answer it with FlyCo, a principled FM-empowered perception-prediction-planning loop enabling fully autonomous, prompt-driven 3D target scanning in diverse unknown open-world environments. FlyCo directly translates low-effort human prompts (text, visual annotations) into precise adaptive scanning flights via three coordinated stages: (1) perception fuses streaming sensor data with vision-language FMs for robust target grounding and tracking; (2) prediction distills FM knowledge and combines multi-modal cues to infer the partially observed target's complete geometry; (3) planning leverages predictive foresight to generate efficient and safe paths with comprehensive target coverage. Building on this, we further design key components to boost open-world target grounding efficiency and robustness, enhance prediction quality in terms of shape accuracy, zero-shot generalization, and temporal stability, and balance long-horizon flight efficiency with real-time computability and online collision avoidance. Extensive challenging real-world and simulation experiments show FlyCo delivers precise scene understanding, high efficiency, and real-time safety, outperforming existing paradigms with lower human effort and verifying the proposed architecture's practicality. Comprehensive ablations validate each component's contribution. FlyCo also serves as a flexible, extensible blueprint, readily leveraging future FM and robotics advances. Code will be released.

Problem

Research questions and friction points this paper is trying to address.

autonomous 3D scanning

open-world environments

drone perception

foundation models

target structure reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Foundation Models

Autonomous Drone

3D Structure Scanning