🤖 AI Summary
Traditional amphibian distribution monitoring suffers from sparse and incomplete data, resulting in low predictive accuracy and poor generalizability. To address this, we propose a deep learning–enhanced multimodal species distribution modeling framework that jointly integrates remote sensing imagery (e.g., NDVI, land cover) with structured environmental covariates, incorporating feature selection and multimodal ensemble learning. We further introduce a novel pseudo-absence imputation strategy coupled with data balancing techniques to mitigate the scarcity of labeled occurrence records. Evaluated on frog habitat classification, the model achieves 84.9% accuracy and an AUC of 0.90; for abundance estimation, it reduces mean absolute error from 189 to 29. Crucially, the framework demonstrates strong robustness and scalability across unseen geographic regions. This work establishes a new paradigm for high-accuracy, automated biodiversity monitoring by synergistically leveraging heterogeneous geospatial data and advanced deep learning methodologies.
📝 Abstract
Monitoring species distribution is vital for conservation efforts, enabling the assessment of environmental impacts and the development of effective preservation strategies. Traditional data collection methods, including citizen science, offer valuable insights but remain limited in coverage and completeness. Species Distribution Modelling (SDM) helps address these gaps by using occurrence data and environmental variables to predict species presence across large regions. In this study, we enhance SDM accuracy for frogs (Anura) by applying deep learning and data imputation techniques using data from the "EY - 2022 Biodiversity Challenge." Our experiments show that data balancing significantly improved model performance, reducing the Mean Absolute Error (MAE) from 189 to 29 in frog counting tasks. Feature selection identified key environmental factors influencing occurrence, optimizing inputs while maintaining predictive accuracy. The multimodal ensemble model, integrating land cover, NDVI, and other environmental inputs, outperformed individual models and showed robust generalization across unseen regions. The fusion of image and tabular data improved both frog counting and habitat classification, achieving 84.9% accuracy with an AUC of 0.90. This study highlights the potential of multimodal learning and data preprocessing techniques such as balancing and imputation to improve predictive ecological modeling when data are sparse or incomplete, contributing to more precise and scalable biodiversity monitoring.