🤖 AI Summary
To address the challenges of fine-grained fish species identification and uncalibrated total-length measurement, this paper introduces AutoFish—the first publicly available benchmark dataset for intelligent, sustainable fishery monitoring. It comprises 1,500 images, instance segmentation masks for 454 morphologically similar fish specimens, individual IDs, and ground-truth total-length annotations. We propose the first end-to-end framework jointly modeling instance-level fine-grained segmentation and geometric size regression. A semi-automated annotation pipeline—leveraging SAM for pre-labeling followed by human refinement—is designed to ensure both efficiency and accuracy. AutoFish establishes the first vision-based fish analysis benchmark tailored to real-world sorting scenarios. Our adapted Mask2Former achieves 89.15 mAP on AutoFish; a custom MobileNetV2-based regression model attains mean absolute errors of 0.62 cm and 1.38 cm for total-length estimation on unoccluded and occluded images, respectively.
📝 Abstract
Automated fish documentation processes are in the near future expected to play an essential role in sustainable fisheries management and for addressing challenges of overfishing. In this paper, we present a novel and publicly available dataset named AutoFish designed for fine-grained fish analysis. The dataset comprises 1,500 images of 454 specimens of visually similar fish placed in various constellations on a white conveyor belt and annotated with instance segmentation masks, IDs, and length measurements. The data was collected in a controlled environment using an RGB camera. The annotation procedure involved manual point annotations, initial segmentation masks proposed by the Segment Anything Model (SAM), and subsequent manual correction of the masks. We establish baseline instance segmentation results using two variations of the Mask2Former architecture, with the best performing model reaching an mAP of 89.15%. Additionally, we present two baseline length estimation methods, the best performing being a custom MobileNetV2-based regression model reaching an MAE of 0.62cm in images with no occlusion and 1.38cm in images with occlusion. Link to project page: https://vap.aau.dk/autofish/.