🤖 AI Summary
Current species distribution models (SDMs) face three key limitations: (1) variable selection necessitates retraining the model for each subset; (2) poor robustness to missing environmental covariates; and (3) difficulty in accurately quantifying and interpreting variable contributions. To address these, we propose MaskSDM—the first deep learning framework specifically designed for SDMs. It introduces a novel masked training strategy enabling zero-shot inference over arbitrary subsets of environmental variables without retraining. By integrating exact Shapley value computation, MaskSDM delivers ecologically interpretable, quantitative attributions for each predictor. Moreover, it inherently supports robust prediction under missing data. Evaluated on the global sPlotOpen plant database across 12,738 species, MaskSDM significantly outperforms imputation-based baselines and achieves performance comparable to dedicated models trained per variable subset. Thus, MaskSDM establishes a foundational paradigm for SDMs that unifies flexibility, missing-data robustness, and ecological interpretability.
📝 Abstract
Species Distribution Models (SDMs) play a vital role in biodiversity research, conservation planning, and ecological niche modeling by predicting species distributions based on environmental conditions. The selection of predictors is crucial, strongly impacting both model accuracy and how well the predictions reflect ecological patterns. To ensure meaningful insights, input variables must be carefully chosen to match the study objectives and the ecological requirements of the target species. However, existing SDMs, including both traditional and deep learning-based approaches, often lack key capabilities for variable selection: (i) flexibility to choose relevant predictors at inference without retraining; (ii) robustness to handle missing predictor values without compromising accuracy; and (iii) explainability to interpret and accurately quantify each predictor's contribution. To overcome these limitations, we introduce MaskSDM, a novel deep learning-based SDM that enables flexible predictor selection by employing a masked training strategy. This approach allows the model to make predictions with arbitrary subsets of input variables while remaining robust to missing data. It also provides a clearer understanding of how adding or removing a given predictor affects model performance and predictions. Additionally, MaskSDM leverages Shapley values for precise predictor contribution assessments, improving upon traditional approximations. We evaluate MaskSDM on the global sPlotOpen dataset, modeling the distributions of 12,738 plant species. Our results show that MaskSDM outperforms imputation-based methods and approximates models trained on specific subsets of variables. These findings underscore MaskSDM's potential to increase the applicability and adoption of SDMs, laying the groundwork for developing foundation models in SDMs that can be readily applied to diverse ecological applications.