Automatic Image-Level Morphological Trait Annotation for Organismal Images

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the bottleneck in ecological research caused by the time-consuming and expert-dependent nature of manual annotation of morphological traits in biological images. To overcome this limitation, the authors propose a modular and scalable framework for automated trait annotation: leveraging foundation vision models to extract features, training sparse autoencoders to yield neurons that respond selectively to specific morphological regions, and integrating vision–language prompting to generate spatially precise and biologically plausible trait descriptions. This approach achieves, for the first time, end-to-end generation of interpretable textual trait descriptions directly from images. The work introduces Bioscan-Traits, a dataset comprising 19K insect images with 80K trait annotations, and validates the method’s robustness and the biological plausibility of its outputs through human evaluation and ablation studies.
📝 Abstract
Morphological traits are physical characteristics of biological organisms that provide vital clues on how organisms interact with their environment. Yet extracting these traits remains a slow, expert-driven process, limiting their use in large-scale ecological studies. A major bottleneck is the absence of high-quality datasets linking biological images to trait-level annotations. In this work, we demonstrate that sparse autoencoders trained on foundation-model features yield monosemantic, spatially grounded neurons that consistently activate on meaningful morphological parts. Leveraging this property, we introduce a trait annotation pipeline that localizes salient regions and uses vision-language prompting to generate interpretable trait descriptions. Using this approach, we construct Bioscan-Traits, a dataset of 80K trait annotations spanning 19K insect images from BIOSCAN-5M. Human evaluation confirms the biological plausibility of the generated morphological descriptions. We assess design sensitivity through a comprehensive ablation study, systematically varying key design choices and measuring their impact on the quality of the resulting trait descriptions. By annotating traits with a modular pipeline rather than prohibitively expensive manual efforts, we offer a scalable way to inject biologically meaningful supervision into foundation models, enable large-scale morphological analyses, and bridge the gap between ecological relevance and machine-learning practicality.
Problem

Research questions and friction points this paper is trying to address.

morphological traits
automatic annotation
organismal images
trait-level annotations
scalable annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse autoencoder
foundation model
morphological trait annotation
vision-language prompting
monosemantic representation