๐ค AI Summary
Illegal gold mining severely degrades the Amazon rainforest, yet conventional satellite remote sensing struggles to monitor it effectively due to persistent cloud cover and the small spatial footprint of mining sites. To address this challenge, this work introduces ELDOR, the first large-scale, high-resolution drone orthoimagery benchmark dataset specifically designed for detecting illegal gold mining in the Amazon. Spanning over 2,500 hectares, ELDOR provides pixel-level semantic annotations and defines four visual understanding tasks to enable fine-grained identification and ecological disturbance analysis. The project integrates a multi-task evaluation framework, an interactive exploration tool, and remote sensingโtailored models, revealing significant limitations of current methods in recognizing small-scale mining structures and post-mining ecological recovery categories, thereby underscoring the critical need for context-aware and multimodal modeling approaches.
๐ Abstract
Illegal gold mining in the Amazon rainforest causes deforestation, water contamination, and long-term ecosystem disruption, yet remains difficult to monitor at fine spatial scales. Satellite imagery supports large-scale observation, but often misses small mining-related structures and subtle land-cover transitions, especially under frequent cloud cover. We introduce ELDOR, a large-scale UAV benchmark for monitoring environmental and landscape disturbance from illegal gold mining in the rainforest. ELDOR contains manually annotated orthomosaic imagery covering over 2,500 hectares, with pixel-level semantic labels for both mining-related activities and surrounding ecological structures. With this unified annotation source, we establish four benchmark tasks: semantic segmentation, segmentation-derived recognition, direct multi-label classification, and class-presence recognition with vision-language models. Across these tasks, we compare generic and remote-sensing-specific segmentation models, vision foundation model-related segmentation methods, direct multi-label classification methods, and vision-language models under a controlled closed-set protocol. Results show that current methods still struggle with rare small-scale mining structures and fine-grained recovery classes, suggesting the need for context-aware and multimodal modeling. To support domain analysis and practical use, we further build an interactive explorer for domain experts that provides a unified interface for data exploration and model inference.