🤖 AI Summary
This study addresses the challenge of simultaneously achieving high accuracy and real-time performance in fine-grained segmentation and gender classification of hemp plants in agricultural settings. The authors propose a lightweight multi-task Vision Transformer architecture that, for the first time, systematically integrates structural reparameterization across the entire network—including the Vision Transformer backbone, a Lite R-ASPP decoder, and a reparameterizable classification head—enabling high representational capacity during training and efficient deployment at inference. Evaluated on a newly curated dataset of 10,264 annotated hemp frames, the model achieves a segmentation mIoU of 92.18% and a gender classification accuracy of 81.91%, with only 3.59M parameters and 3.80 GFLOPs, while running at 108.95 FPS—outperforming iFormer-T in both efficiency and accuracy.
📝 Abstract
We present WeedRepFormer, a lightweight multi-task Vision Transformer designed for simultaneous waterhemp segmentation and gender classification. Existing agricultural models often struggle to balance the fine-grained feature extraction required for biological attribute classification with the efficiency needed for real-time deployment. To address this, WeedRepFormer systematically integrates structural reparameterization across the entire architecture - comprising a Vision Transformer backbone, a Lite R-ASPP decoder, and a novel reparameterizable classification head - to decouple training-time capacity from inference-time latency. We also introduce a comprehensive waterhemp dataset containing 10,264 annotated frames from 23 plants. On this benchmark, WeedRepFormer achieves 92.18% mIoU for segmentation and 81.91% accuracy for gender classification using only 3.59M parameters and 3.80 GFLOPs. At 108.95 FPS, our model outperforms the state-of-the-art iFormer-T by 4.40% in classification accuracy while maintaining competitive segmentation performance and significantly reducing parameter count by 1.9x.