WeedRepFormer: Reparameterizable Vision Transformers for Real-Time Waterhemp Segmentation and Gender Classification

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study addresses the challenge of simultaneously achieving high accuracy and real-time performance in fine-grained segmentation and gender classification of hemp plants in agricultural settings. The authors propose a lightweight multi-task Vision Transformer architecture that, for the first time, systematically integrates structural reparameterization across the entire network—including the Vision Transformer backbone, a Lite R-ASPP decoder, and a reparameterizable classification head—enabling high representational capacity during training and efficient deployment at inference. Evaluated on a newly curated dataset of 10,264 annotated hemp frames, the model achieves a segmentation mIoU of 92.18% and a gender classification accuracy of 81.91%, with only 3.59M parameters and 3.80 GFLOPs, while running at 108.95 FPS—outperforming iFormer-T in both efficiency and accuracy.

Technology Category

Application Category

📝 Abstract

We present WeedRepFormer, a lightweight multi-task Vision Transformer designed for simultaneous waterhemp segmentation and gender classification. Existing agricultural models often struggle to balance the fine-grained feature extraction required for biological attribute classification with the efficiency needed for real-time deployment. To address this, WeedRepFormer systematically integrates structural reparameterization across the entire architecture - comprising a Vision Transformer backbone, a Lite R-ASPP decoder, and a novel reparameterizable classification head - to decouple training-time capacity from inference-time latency. We also introduce a comprehensive waterhemp dataset containing 10,264 annotated frames from 23 plants. On this benchmark, WeedRepFormer achieves 92.18% mIoU for segmentation and 81.91% accuracy for gender classification using only 3.59M parameters and 3.80 GFLOPs. At 108.95 FPS, our model outperforms the state-of-the-art iFormer-T by 4.40% in classification accuracy while maintaining competitive segmentation performance and significantly reducing parameter count by 1.9x.

Problem

Research questions and friction points this paper is trying to address.

real-time segmentation

gender classification

waterhemp

Vision Transformer

model efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

structural reparameterization

Vision Transformer

real-time inference