🤖 AI Summary
To address the lack of efficient, automated pipelines for beetle image analysis, this paper proposes a three-stage deep learning framework. First, an iterative detection module integrates open-vocabulary detection (OVD) with vision-language models (VLMs) to robustly localize beetle instances in tray images under zero-shot or few-shot settings. Second, individual beetles are cropped at the instance level. Third, a Transformer-based segmentation model is fine-tuned to achieve high-precision, fine-grained morphological segmentation. The key contributions include: (i) synergistic integration of OVD and VLMs for cross-category beetle detection without extensive labeled data; and (ii) joint optimization of segmentation performance via transfer learning and human-in-the-loop annotation. Evaluated on thousands of real-world beetle tray images, the method significantly improves detection accuracy and segmentation fidelity, accelerating analysis throughput by several-fold over manual processing. It establishes a scalable technical paradigm for large-scale image analysis in entomological morphology and ecology.
📝 Abstract
In entomology and ecology research, biologists often need to collect a large number of insects, among which beetles are the most common species. A common practice for biologists to organize beetles is to place them on trays and take a picture of each tray. Given the images of thousands of such trays, it is important to have an automated pipeline to process the large-scale data for further research. Therefore, we develop a 3-stage pipeline to detect all the beetles on each tray, sort and crop the image of each beetle, and do morphological segmentation on the cropped beetles. For detection, we design an iterative process utilizing a transformer-based open-vocabulary object detector and a vision-language model. For segmentation, we manually labeled 670 beetle images and fine-tuned two variants of a transformer-based segmentation model to achieve fine-grained segmentation of beetles with relatively high accuracy. The pipeline integrates multiple deep learning methods and is specialized for beetle image processing, which can greatly improve the efficiency to process large-scale beetle data and accelerate biological research.