LD-SDM: Language-Driven Hierarchical Species Distribution Modeling

📅 2023-12-13

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 2

career value

220K/year

🤖 AI Summary

This work addresses the challenge of global-scale species distribution modeling using presence-only occurrence records. We propose the first zero-shot prediction framework that jointly leverages taxonomic hierarchical semantics and geospatial environmental features. Methodologically, we employ large language models to encode species’ taxonomic pathways—yielding hierarchical semantic embeddings—and integrate these with multi-label classification loss and geographic feature regression to enable distribution prediction across taxonomic ranks, including unseen species. To enable fair evaluation without ground-truth distributions, we introduce a novel proximity-aware pixel-level metric. Experiments demonstrate that our approach consistently outperforms state-of-the-art methods on species distribution prediction, zero-shot generalization, and cross-rank transfer tasks, significantly improving generalization to novel species and higher-level taxonomic units.

📝 Abstract

We focus on the problem of species distribution modeling using global-scale presence-only data. Most previous studies have mapped the range of a given species using geographical and environmental features alone. To capture a stronger implicit relationship between species, we encode the taxonomic hierarchy of species using a large language model. This enables range mapping for any taxonomic rank and unseen species without additional supervision. Further, we propose a novel proximity-aware evaluation metric that enables evaluating species distribution models using any pixel-level representation of ground-truth species range map. The proposed metric penalizes the predictions of a model based on its proximity to the ground truth. We describe the effectiveness of our model by systematically evaluating on the task of species range prediction, zero-shot prediction and geo-feature regression against the state-of-the-art. Results show our model outperforms the strong baselines when trained with a variety of multi-label learning losses.

Problem

Research questions and friction points this paper is trying to address.

Modeling species distributions with global presence-only data

Integrating taxonomic classification via language models

Developing proximity-aware evaluation metrics for SDMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating taxonomic classification via language model

Mapping any taxonomic rank without additional supervision

Introducing proximity-aware evaluation metric for SDM

🔎 Similar Papers

No similar papers found.