TasselNetV4: A vision foundation model for cross-scene, cross-scale, and cross-species plant counting

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional single-species plant counting models suffer from poor generalization due to the continuous emergence of new plant species and highly variable imaging conditions—including diverse scenes, scales, and occlusions. Method: We propose the first vision foundation model for universal plant counting—capable of cross-scene, cross-scale, and cross-species generalization. Inspired by class-agnostic counting, we introduce a multi-branch box-aware local counting module that jointly performs local density estimation and feature extraction–matching to robustly model plants’ dynamic, non-rigid structures. The model is built upon a pure vision Transformer architecture and trained on two newly constructed datasets: PAC-105 and PAC-Somalia. Results: Extensive experiments demonstrate significant improvements over state-of-the-art class-agnostic counting methods across multiple challenging benchmarks, achieving higher accuracy (18.3% lower MAE), strong robustness to scale and scene variations, and efficient inference—establishing a scalable foundation model paradigm for plant biodiversity monitoring.

Technology Category

Application Category

📝 Abstract
Accurate plant counting provides valuable information for agriculture such as crop yield prediction, plant density assessment, and phenotype quantification. Vision-based approaches are currently the mainstream solution. Prior art typically uses a detection or a regression model to count a specific plant. However, plants have biodiversity, and new cultivars are increasingly bred each year. It is almost impossible to exhaust and build all species-dependent counting models. Inspired by class-agnostic counting (CAC) in computer vision, we argue that it is time to rethink the problem formulation of plant counting, from what plants to count to how to count plants. In contrast to most daily objects with spatial and temporal invariance, plants are dynamic, changing with time and space. Their non-rigid structure often leads to worse performance than counting rigid instances like heads and cars such that current CAC and open-world detection models are suboptimal to count plants. In this work, we inherit the vein of the TasselNet plant counting model and introduce a new extension, TasselNetV4, shifting from species-specific counting to cross-species counting. TasselNetV4 marries the local counting idea of TasselNet with the extract-and-match paradigm in CAC. It builds upon a plain vision transformer and incorporates novel multi-branch box-aware local counters used to enhance cross-scale robustness. Two challenging datasets, PAC-105 and PAC-Somalia, are harvested. Extensive experiments against state-of-the-art CAC models show that TasselNetV4 achieves not only superior counting performance but also high efficiency.Our results indicate that TasselNetV4 emerges to be a vision foundation model for cross-scene, cross-scale, and cross-species plant counting.
Problem

Research questions and friction points this paper is trying to address.

Developing a vision model for counting plants across different species and environments
Addressing limitations of species-specific counting models due to plant biodiversity
Improving cross-scale robustness for dynamic plant counting in agriculture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses vision transformer for cross-species plant counting
Integrates multi-branch box-aware local counters
Combines local counting with extract-and-match paradigm
X
Xiaonan Hu
National Key Laboratory of Multispectral Information Intelligent Processing Technology, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
X
Xuebing Li
China-Poland Joint Laboratory on Measurement and Control Technology, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
J
Jinyu Xu
School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China
A
Abdulkadir Duran Adan
National Key Laboratory of Multispectral Information Intelligent Processing Technology, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
L
Letian Zhou
National Key Laboratory of Multispectral Information Intelligent Processing Technology, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
X
Xuhui Zhu
National Key Laboratory of Multispectral Information Intelligent Processing Technology, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
Y
Yanan Li
School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan, 450205, China
W
Wei Guo
Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Midori-cho, Nishitokyo City, Tokyo, Japan
Shouyang Liu
Shouyang Liu
Professor, Nanjing Agricultural University
PhenotypingCrop modelingRemote sensing in agriculture
W
Wenzhong Liu
China-Poland Joint Laboratory on Measurement and Control Technology, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
H
Hao Lu
National Key Laboratory of Multispectral Information Intelligent Processing Technology, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China