Falcon: A Remote Sensing Vision-Language Foundation Model

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of unified, prompt-driven vision-language foundation models tailored for remote sensing—capable of supporting diverse downstream tasks (e.g., classification, detection, segmentation, captioning)—this work introduces Falcon: the first lightweight (0.7B parameters), multi-granularity (image-/region-/pixel-level) vision-language foundation model designed specifically for remote sensing. We propose Falcon_SFT, a large-scale, remote-sensing-specific instruction-tuning dataset comprising 78 million high-quality samples, manually curated and verified. Falcon integrates multi-scale image encoding, hierarchical instruction alignment, and cross-modal prompt learning. Evaluated across 67 remote sensing benchmarks spanning 14 distinct task categories, Falcon consistently outperforms existing methods with strong generalization capability. To foster community advancement, we fully open-source the code, datasets, and model weights, accelerating the development of the remote sensing foundation model ecosystem.

Technology Category

Application Category

📝 Abstract
This paper introduces a holistic vision-language foundation model tailored for remote sensing, named Falcon. Falcon offers a unified, prompt-based paradigm that effectively executes comprehensive and complex remote sensing tasks. Falcon demonstrates powerful understanding and reasoning abilities at the image, region, and pixel levels. Specifically, given simple natural language instructions and remote sensing images, Falcon can produce impressive results in text form across 14 distinct tasks, i.e., image classification, object detection, segmentation, image captioning, and etc. To facilitate Falcon's training and empower its representation capacity to encode rich spatial and semantic information, we developed Falcon_SFT, a large-scale, multi-task, instruction-tuning dataset in the field of remote sensing. The Falcon_SFT dataset consists of approximately 78 million high-quality data samples, covering 5.6 million multi-spatial resolution and multi-view remote sensing images with diverse instructions. It features hierarchical annotations and undergoes manual sampling verification to ensure high data quality and reliability. Extensive comparative experiments are conducted, which verify that Falcon achieves remarkable performance over 67 datasets and 14 tasks, despite having only 0.7B parameters. We release the complete dataset, code, and model weights at https://github.com/TianHuiLab/Falcon, hoping to help further develop the open-source community.
Problem

Research questions and friction points this paper is trying to address.

Develops Falcon, a vision-language model for remote sensing tasks.
Enables comprehensive remote sensing analysis via natural language instructions.
Introduces Falcon_SFT, a large dataset for training and enhancing model capabilities.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified prompt-based paradigm for remote sensing tasks
Large-scale multi-task instruction-tuning dataset Falcon_SFT
High-performance model with 0.7B parameters across 14 tasks
🔎 Similar Papers
No similar papers found.
K
Kelu Yao
Research Center for Space Computing System, ZhejiangLab, Hangzhou, China
N
Nuo Xu
Research Center for Space Computing System, ZhejiangLab, Hangzhou, China
R
Rong Yang
Research Center for Space Computing System, ZhejiangLab, Hangzhou, China
Yingying Xu
Yingying Xu
Zhejiang Lab
Medical Image Processing
Z
Zhuoyan Gao
Research Center for Space Computing System, ZhejiangLab, Hangzhou, China
T
Titinunt Kitrungrotsakul
Research Center for Space Computing System, ZhejiangLab, Hangzhou, China
Y
Yi Ren
Research Center for Space Computing System, ZhejiangLab, Hangzhou, China
P
Pu Zhang
Research Center for Space Computing System, ZhejiangLab, Hangzhou, China
J
Jin Wang
Research Center for Space Computing System, ZhejiangLab, Hangzhou, China
N
Ning Wei
Research Center for Space Computing System, ZhejiangLab, Hangzhou, China
C
Chao Li
Research Center for Space Computing System, ZhejiangLab, Hangzhou, China