BotaCLIP: Contrastive Learning for Botany-Aware Representation of Earth Observation Data

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Pretrained remote sensing foundation models struggle to efficiently incorporate domain-specific knowledge (e.g., botany) without suffering catastrophic forgetting. To address this, we propose BotaCLIP—a lightweight multimodal contrastive learning framework that avoids full retraining and instead employs regularized fine-tuning to align high-resolution aerial imagery with ground-truth vegetation plot data, thereby constructing a botany-aware embedding space atop pretrained Earth observation models (e.g., DOFA). Its core innovation lies in decoupling domain knowledge injection from representation transfer, jointly optimizing semantic alignment and preservation of prior task performance. Evaluated on three ecological downstream tasks—plant presence prediction, butterfly distribution modeling, and soil nutrient group abundance estimation—BotaCLIP consistently outperforms both the original DOFA and supervised baselines. Results demonstrate its ecological interpretability, cross-task generalizability, and computational efficiency.

Technology Category

Application Category

📝 Abstract
Foundation models have demonstrated a remarkable ability to learn rich, transferable representations across diverse modalities such as images, text, and audio. In modern machine learning pipelines, these representations often replace raw data as the primary input for downstream tasks. In this paper, we address the challenge of adapting a pre-trained foundation model to inject domain-specific knowledge, without retraining from scratch or incurring significant computational costs. To this end, we introduce BotaCLIP, a lightweight multimodal contrastive framework that adapts a pre-trained Earth Observation foundation model (DOFA) by aligning high-resolution aerial imagery with botanical relevés. Unlike generic embeddings, BotaCLIP internalizes ecological structure through contrastive learning with a regularization strategy that mitigates catastrophic forgetting. Once trained, the resulting embeddings serve as transferable representations for downstream predictors. Motivated by real-world applications in biodiversity modeling, we evaluated BotaCLIP representations in three ecological tasks: plant presence prediction, butterfly occurrence modeling, and soil trophic group abundance estimation. The results showed consistent improvements over those derived from DOFA and supervised baselines. More broadly, this work illustrates how domain-aware adaptation of foundation models can inject expert knowledge into data-scarce settings, enabling frugal representation learning.
Problem

Research questions and friction points this paper is trying to address.

Adapting pre-trained foundation models to inject domain-specific knowledge efficiently
Aligning high-resolution aerial imagery with botanical data using contrastive learning
Enabling transferable representations for ecological tasks like biodiversity modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight multimodal contrastive learning framework
Aligns aerial imagery with botanical relevés
Regularization strategy mitigates catastrophic forgetting
🔎 Similar Papers
No similar papers found.