SPEX: A Vision-Language Model for Land Cover Extraction on Spectral Remote Sensing Images

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient spectral information utilization and limited pixel-level performance of vision-language models in multispectral remote sensing image land-cover extraction, this paper proposes SPEX: (1) the first spectral-prior-integrated vision-language instruction dataset, SPIE; (2) a multimodal architecture incorporating multi-scale feature aggregation, context token compression, and multispectral vision pretraining to achieve spectral–visual–linguistic alignment; and (3) unified support for instruction-driven pixel-wise segmentation and interpretable textual explanation generation. Evaluated on five public multispectral datasets, SPEX consistently outperforms state-of-the-art methods, achieving superior accuracy in identifying vegetation, buildings, water bodies, and other land-cover classes. Crucially, its predictions are accompanied by semantically coherent, human-readable textual explanations—enhancing model transparency and domain interpretability.

Technology Category

Application Category

📝 Abstract
Spectral information has long been recognized as a critical cue in remote sensing observations. Although numerous vision-language models have been developed for pixel-level interpretation, spectral information remains underutilized, resulting in suboptimal performance, particularly in multispectral scenarios. To address this limitation, we construct a vision-language instruction-following dataset named SPIE, which encodes spectral priors of land-cover objects into textual attributes recognizable by large language models (LLMs), based on classical spectral index computations. Leveraging this dataset, we propose SPEX, a multimodal LLM designed for instruction-driven land cover extraction. To this end, we introduce several carefully designed components and training strategies, including multiscale feature aggregation, token context condensation, and multispectral visual pre-training, to achieve precise and flexible pixel-level interpretation. To the best of our knowledge, SPEX is the first multimodal vision-language model dedicated to land cover extraction in spectral remote sensing imagery. Extensive experiments on five public multispectral datasets demonstrate that SPEX consistently outperforms existing state-of-the-art methods in extracting typical land cover categories such as vegetation, buildings, and water bodies. Moreover, SPEX is capable of generating textual explanations for its predictions, thereby enhancing interpretability and user-friendliness. Code will be released at: https://github.com/MiliLab/SPEX.
Problem

Research questions and friction points this paper is trying to address.

Underutilized spectral info in remote sensing image analysis
Need for precise pixel-level land cover extraction
Lack of interpretable vision-language models for multispectral data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language model for spectral land cover extraction
Multiscale feature aggregation and token condensation
Multispectral visual pre-training for precise interpretation
🔎 Similar Papers
No similar papers found.
D
Dongchen Si
Xinjiang University, Urumqi, 830046, Xinjiang, China
D
Di Wang
School of Computer Science, Wuhan University, Wuhan, 430072, Hubei, China
E
Erzhong Gao
iFlytek Co., Ltd, Hefei, 230088, Anhui, China
Xiaolei Qin
Xiaolei Qin
Wuhan University
Remote sensing
L
Liu Zhao
iFlytek Co., Ltd, Hefei, 230088, Anhui, China
J
Jing Zhang
School of Computer Science, Wuhan University, Wuhan, 430072, Hubei, China
M
Minqiang Xu
iFlytek Co., Ltd, Hefei, 230088, Anhui, China
J
Jianbo Zhan
iFlytek Co., Ltd, Hefei, 230088, Anhui, China
J
Jianshe Wang
iFlytek Co., Ltd, Hefei, 230088, Anhui, China
L
Lin Liu
iFlytek Co., Ltd, Hefei, 230088, Anhui, China
Bo Du
Bo Du
Department of Management, Griffith Business School
Sustainable TransportTravel BehaviourUrban Data AnalyticsLogistics and Supply Chain
L
Liangpei Zhang
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, Hubei, China