đ¤ AI Summary
Existing land use and land cover (LULC) mapping models suffer from poor generalizability, heavy reliance on strong supervision, and difficulty adapting to multimodal remote sensing data and heterogeneous classification schemas. Both task-agnostic and task-specific foundation models in remote sensing face bottlenecks including high fine-tuning costs and severe label scarcity.
Method: We propose a flexible foundation model for LULC mapping: (i) constructing LAS, a large-scale weakly supervised multimodal dataset; (ii) designing remote sensingâspecific adapters and a text-enhancement module to fuse cross-modal features and semantic priors; and (iii) introducing a class-confidence-guided weighted fusion strategy to boost zero-shot transferability.
Contribution/Results: Evaluated across six heterogeneous datasetsâincluding optical, SAR, and LiDAR modalities under diverse classification systemsâthe model significantly outperforms state-of-the-art methods, demonstrating superior generalizationâespecially in unseen modalities and novel classesâwithout requiring task-specific annotations.
đ Abstract
Land Use and Land Cover (LULC) mapping is a fundamental task in Earth Observation (EO). However, current LULC models are typically developed for a specific modality and a fixed class taxonomy, limiting their generability and broader applicability. Recent advances in foundation models (FMs) offer promising opportunities for building universal models. Yet, task-agnostic FMs often require fine-tuning for downstream applications, whereas task-specific FMs rely on massive amounts of labeled data for training, which is costly and impractical in the remote sensing (RS) domain. To address these challenges, we propose LandSegmenter, an LULC FM framework that resolves three-stage challenges at the input, model, and output levels. From the input side, to alleviate the heavy demand on labeled data for FM training, we introduce LAnd Segment (LAS), a large-scale, multi-modal, multi-source dataset built primarily with globally sampled weak labels from existing LULC products. LAS provides a scalable, cost-effective alternative to manual annotation, enabling large-scale FM training across diverse LULC domains. For model architecture, LandSegmenter integrates an RS-specific adapter for cross-modal feature extraction and a text encoder for semantic awareness enhancement. At the output stage, we introduce a class-wise confidence-guided fusion strategy to mitigate semantic omissions and further improve LandSegmenter's zero-shot performance. We evaluate LandSegmenter on six precisely annotated LULC datasets spanning diverse modalities and class taxonomies. Extensive transfer learning and zero-shot experiments demonstrate that LandSegmenter achieves competitive or superior performance, particularly in zero-shot settings when transferred to unseen datasets. These results highlight the efficacy of our proposed framework and the utility of weak supervision for building task-specific FMs.