🤖 AI Summary
Mold colony detection for indoor air quality assessment typically requires large-scale annotated datasets and extensive training time. Method: This study investigates the few-shot transfer capability of vision foundation models under extremely limited labeling budgets. Leveraging a custom-built dataset of 5,000 colony images—each annotated with bounding boxes and pixel-level masks—we comparatively evaluate MaskDINO and YOLOv9. Contribution/Results: MaskDINO achieves performance on par with fully trained YOLOv9 using only 150 fine-tuning images; remarkably, it sustains ~70% reliable detection accuracy even with as few as 25 annotated samples. To our knowledge, this is the first work demonstrating that general-purpose vision foundation models exhibit high data efficiency and strong generalization in fine-grained microbial image detection. The approach substantially reduces annotation costs and establishes a novel paradigm for automated microbial analysis in resource-constrained settings.
📝 Abstract
The process of quantifying mold colonies on Petri dish samples is of critical importance for the assessment of indoor air quality, as high colony counts can indicate potential health risks and deficiencies in ventilation systems. Conventionally the automation of such a labor-intensive process, as well as other tasks in microbiology, relies on the manual annotation of large datasets and the subsequent extensive training of models like YoloV9. To demonstrate that exhaustive annotation is not a prerequisite anymore when tackling a new vision task, we compile a representative dataset of 5000 Petri dish images annotated with bounding boxes, simulating both a traditional data collection approach as well as few-shot and low-shot scenarios with well curated subsets with instance level masks. We benchmark three vision foundation models against traditional baselines on task specific metrics, reflecting realistic real-world requirements. Notably, MaskDINO attains near-parity with an extensively trained YoloV9 model while finetuned only on 150 images, retaining competitive performance with as few as 25 images, still being reliable on $approx$ 70% of the samples. Our results show that data-efficient foundation models can match traditional approaches with only a fraction of the required data, enabling earlier development and faster iterative improvement of automated microbiological systems with a superior upper-bound performance than traditional models would achieve.