OSDA: A Framework for Open-Set Discovery and Automatic Interpretation of Land-cover in Remote Sensing Imagery

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the dual challenges of fine-grained localization and semantic open-set classification in remote sensing imagery, this paper proposes the first unsupervised three-stage framework for land cover analysis. First, a fine-tuned Segment Anything Model (SAM) enables label-free pixel-level mask extraction. Second, a two-stage fine-tuning strategy adapts a multimodal large language model (MLLM) to automatically generate semantic names and contextual descriptions for novel land cover classes. Third, an LLM-as-judge mechanism evaluates the plausibility of generated descriptions. This work pioneers the deep integration of MLLMs into the land cover understanding pipeline, achieving high-precision segmentation and interpretable semantic outputs without any human annotation. Experiments on diverse satellite imagery demonstrate strong generalization capability and human-readable outputs, significantly enhancing the practicality and scalability of automated cartographic updating and large-scale Earth observation analytics.

Technology Category

Application Category

📝 Abstract
Open-set land-cover analysis in remote sensing requires the ability to achieve fine-grained spatial localization and semantically open categorization. This involves not only detecting and segmenting novel objects without categorical supervision but also assigning them interpretable semantic labels through multimodal reasoning. In this study, we introduce OSDA, an integrated three-stage framework for annotation-free open-set land-cover discovery, segmentation, and description. The pipeline consists of: (1) precise discovery and mask extraction with a promptable fine-tuned segmentation model (SAM), (2) semantic attribution and contextual description via a two-phase fine-tuned multimodal large language model (MLLM), and (3) LLM-as-judge and manual scoring of the MLLMs evaluation. By combining pixel-level accuracy with high-level semantic understanding, OSDA addresses key challenges in open-world remote sensing interpretation. Designed to be architecture-agnostic and label-free, the framework supports robust evaluation across diverse satellite imagery without requiring manual annotation. Our work provides a scalable and interpretable solution for dynamic land-cover monitoring, showing strong potential for automated cartographic updating and large-scale earth observation analysis.
Problem

Research questions and friction points this paper is trying to address.

Achieving fine-grained spatial localization for open-set land-cover analysis
Detecting and segmenting novel objects without categorical supervision
Assigning interpretable semantic labels through multimodal reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned SAM model for precise segmentation
Two-phase MLLM for semantic attribution
Architecture-agnostic framework without manual annotation
🔎 Similar Papers
No similar papers found.
S
Siyi Chen
Johns Hopkins University (JHU)
K
Kai Wang
The University of Hong Kong (HKU)
W
Weicong Pang
National University of Singapore (NUS)
R
Ruiming Yang
National University of Singapore (NUS)
Ziru Chen
Ziru Chen
The Ohio State University
Conversational AINatural Language ProcessingMachine Learning
R
Renjun Gao
Macau University of Science and Technology (MUST)
A
Alexis Kai Hon Lau
The Hong Kong University of Science and Technology (HKUST)
Dasa Gu
Dasa Gu
Hong Kong University of Science and Technology
Atmospheric ChemistryVolatile Organic CompoundsNumerical ModelingSatellite Remote SensingEmission
C
Chenchen Zhang
The Hong Kong University of Science and Technology (HKUST)
C
Cheng Li
The Hong Kong University of Science and Technology (HKUST)