GlobalGeoTree: A Multi-Granular Vision-Language Dataset for Global Tree Species Classification

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Global tree species classification is hindered by the scarcity of large-scale, geographically aligned multimodal annotated data. To address this, we introduce GlobalGeoTree—the first global-scale, multi-granularity, vision-language协同 tree species dataset—comprising 21,011 species, 6.3 million geotagged Sentinel-2 time-series image sequences, and 27-dimensional environmental variables, supporting hierarchical classification from species to family/genus levels. We propose GeoTreeCLIP, an ecology-aware multimodal model that jointly encodes remote sensing time series, precise geographic coordinates, heterogeneous environmental features, and hierarchical textual labels; it incorporates geo-aware prompting and multi-granularity label alignment. Evaluated on the GlobalGeoTree-10kEval benchmark, our method achieves zero-shot and 5-shot classification accuracies surpassing prior state-of-the-art by 12.7% and 9.4%, respectively. This advancement significantly enhances automated global tree species identification and biodiversity monitoring capabilities.

Technology Category

Application Category

📝 Abstract
Global tree species mapping using remote sensing data is vital for biodiversity monitoring, forest management, and ecological research. However, progress in this field has been constrained by the scarcity of large-scale, labeled datasets. To address this, we introduce GlobalGeoTree, a comprehensive global dataset for tree species classification. GlobalGeoTree comprises 6.3 million geolocated tree occurrences, spanning 275 families, 2,734 genera, and 21,001 species across the hierarchical taxonomic levels. Each sample is paired with Sentinel-2 image time series and 27 auxiliary environmental variables, encompassing bioclimatic, geographic, and soil data. The dataset is partitioned into GlobalGeoTree-6M for model pretraining and curated evaluation subsets, primarily GlobalGeoTree-10kEval for zero-shot and few-shot benchmarking. To demonstrate the utility of the dataset, we introduce a baseline model, GeoTreeCLIP, which leverages paired remote sensing data and taxonomic text labels within a vision-language framework pretrained on GlobalGeoTree-6M. Experimental results show that GeoTreeCLIP achieves substantial improvements in zero- and few-shot classification on GlobalGeoTree-10kEval over existing advanced models. By making the dataset, models, and code publicly available, we aim to establish a benchmark to advance tree species classification and foster innovation in biodiversity research and ecological applications.
Problem

Research questions and friction points this paper is trying to address.

Global tree species classification lacks large-scale labeled datasets.
Need for comprehensive dataset with multi-granular taxonomic and environmental data.
Developing benchmark models for zero-shot and few-shot tree species classification.
Innovation

Methods, ideas, or system contributions that make the work stand out.

GlobalGeoTree dataset with 6.3M geolocated tree samples
Sentinel-2 time series and 27 environmental variables
GeoTreeCLIP model for vision-language classification
🔎 Similar Papers
No similar papers found.