A multi-modal dataset for insect biodiversity with imagery and DNA at the trap and individual level

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of automatic classification and high annotation costs for unsorted insect specimens in large-scale ecological surveys, this study introduces MassID45—the first multimodal insect biodiversity dataset integrating trap-level and specimen-level high-resolution imagery with DNA barcodes, enabling synchronized imaging and molecular annotation at the mixed-arthropod-sample level. We propose an AI-assisted annotation framework that combines instance segmentation with DNA-based validation to generate pixel-level masks and species-level labels for over 17,000 minute specimens. MassID45 significantly improves fine-grained insect detection accuracy and community-level diversity assessment, overcoming the bottleneck of conventional labor-intensive, single-specimen manual annotation. It establishes a new paradigm and benchmark resource for ecological monitoring and cross-disciplinary research in multimodal biological identification.

Technology Category

Application Category

📝 Abstract
Insects comprise millions of species, many experiencing severe population declines under environmental and habitat changes. High-throughput approaches are crucial for accelerating our understanding of insect diversity, with DNA barcoding and high-resolution imaging showing strong potential for automatic taxonomic classification. However, most image-based approaches rely on individual specimen data, unlike the unsorted bulk samples collected in large-scale ecological surveys. We present the Mixed Arthropod Sample Segmentation and Identification (MassID45) dataset for training automatic classifiers of bulk insect samples. It uniquely combines molecular and imaging data at both the unsorted sample level and the full set of individual specimens. Human annotators, supported by an AI-assisted tool, performed two tasks on bulk images: creating segmentation masks around each individual arthropod and assigning taxonomic labels to over 17 000 specimens. Combining the taxonomic resolution of DNA barcodes with precise abundance estimates of bulk images holds great potential for rapid, large-scale characterization of insect communities. This dataset pushes the boundaries of tiny object detection and instance segmentation, fostering innovation in both ecological and machine learning research.
Problem

Research questions and friction points this paper is trying to address.

Develop dataset for automatic classification of bulk insect samples
Combine DNA barcoding and imaging for insect biodiversity analysis
Improve tiny object detection and segmentation in ecological research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines DNA barcoding with high-resolution imaging
AI-assisted segmentation and taxonomic labeling
Enables tiny object detection in bulk samples
J
Johanna Orsholm
Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
J
John Quinto
University of Guelph, Guelph, Ontario, Canada
H
Hannu Autto
Kilpisjärvi Biological Station, University of Helsinki, Helsinki, Finland
G
Gaia Banelyte
Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
N
Nicolas Chazot
Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
J
Jeremy deWaard
Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
S
Stephanie deWaard
Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
A
Arielle Farrell
Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
B
Brendan Furneaux
Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland
B
Bess Hardwick
Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
N
Nao Ito
Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
Amlan Kar
Amlan Kar
NVIDIA, University of Toronto
Computer Vision
O
Oula Kalttopää
Kilpisjärvi Biological Station, University of Helsinki, Helsinki, Finland
D
Deirdre Kerdraon
Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
E
Erik Kristensen
Unit for Field-based Forest Research, Swedish University of Agricultural Sciences, Umeå, Sweden
J
Jaclyn McKeown
Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
T
Tommi Mononen
Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
E
Ellen Nein
Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
H
Hanna Rogers
Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
T
Tomas Roslin
Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
P
Paula Schmitz
Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden
J
Jayme Sones
Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
M
Maija Sujala
Kilpisjärvi Biological Station, University of Helsinki, Helsinki, Finland
A
Amy Thompson
Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
E
Evgeny V. Zakharov
Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada