CISO: Species Distribution Modeling Conditioned on Incomplete Species Observations

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional species distribution models (SDMs) neglect interspecific biotic interactions, and existing integration approaches rely on symmetric pairwise relationships and complete co-occurrence data—assumptions incompatible with the pervasive sparsity and heterogeneity of field observations. To address this, we propose CISO: the first deep learning SDM framework capable of handling incomplete, asymmetric, and cross-taxon species observations. CISO jointly models variable-length, heterogeneous biological observations alongside environmental covariates without requiring co-occurrence assumptions. Evaluated on sPlotOpen, SatBird, and a newly curated SatButterfly dataset, CISO achieves significantly improved predictive accuracy across multiple independent test sets. Notably, it outperforms state-of-the-art methods in inferring distributions of unobserved species given partial taxon knowledge (“partial-species-known” setting). Furthermore, cross-dataset joint modeling demonstrates consistent gains, validating CISO’s capacity to leverage heterogeneous observational sources for robust, scalable biodiversity modeling.

Technology Category

Application Category

📝 Abstract
Species distribution models (SDMs) are widely used to predict species' geographic distributions, serving as critical tools for ecological research and conservation planning. Typically, SDMs relate species occurrences to environmental variables representing abiotic factors, such as temperature, precipitation, and soil properties. However, species distributions are also strongly influenced by biotic interactions with other species, which are often overlooked. While some methods partially address this limitation by incorporating biotic interactions, they often assume symmetrical pairwise relationships between species and require consistent co-occurrence data. In practice, species observations are sparse, and the availability of information about the presence or absence of other species varies significantly across locations. To address these challenges, we propose CISO, a deep learning-based method for species distribution modeling Conditioned on Incomplete Species Observations. CISO enables predictions to be conditioned on a flexible number of species observations alongside environmental variables, accommodating the variability and incompleteness of available biotic data. We demonstrate our approach using three datasets representing different species groups: sPlotOpen for plants, SatBird for birds, and a new dataset, SatButterfly, for butterflies. Our results show that including partial biotic information improves predictive performance on spatially separate test sets. When conditioned on a subset of species within the same dataset, CISO outperforms alternative methods in predicting the distribution of the remaining species. Furthermore, we show that combining observations from multiple datasets can improve performance. CISO is a promising ecological tool, capable of incorporating incomplete biotic information and identifying potential interactions between species from disparate taxa.
Problem

Research questions and friction points this paper is trying to address.

SDMs often ignore incomplete biotic species interaction data
Existing methods assume symmetrical species co-occurrence relationships
Handling sparse and variable species observation data is challenging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning for incomplete species observations
Flexible conditioning on variable biotic data
Improved prediction with partial biotic information
🔎 Similar Papers
No similar papers found.