🤖 AI Summary
In multi-view plant phenotyping, redundant viewpoints cause feature overlap, degrading accuracy in plant age prediction and leaf count estimation. To address this, we propose a view sparsification framework that introduces learnable selection vectors and matrices to dynamically identify the most discriminative subset of views, enabling robust view-invariant feature learning. Our method integrates stochastic view sampling with multi-view feature aggregation and is optimized end-to-end on the Growth Modelling Challenge dataset—comprising multi-height, multi-angle plant imagery. Experiments demonstrate state-of-the-art performance on both plant age prediction and leaf count estimation, significantly outperforming single-view and conventional multi-view baselines. The approach enhances modeling efficiency and generalization across heterogeneous visual phenotypic sources, advancing scalable and robust multi-view plant phenotyping.
📝 Abstract
Plant phenotyping involves analyzing observable characteristics of plants to better understand their growth, health, and development. In the context of deep learning, this analysis is often approached through single-view classification or regression models. However, these methods often fail to capture all information required for accurate estimation of target phenotypic traits, which can adversely affect plant health assessment and harvest readiness prediction. To address this, the Growth Modelling (GroMo) Grand Challenge at ACM Multimedia 2025 provides a multi-view dataset featuring multiple plants and two tasks: Plant Age Prediction and Leaf Count Estimation. Each plant is photographed from multiple heights and angles, leading to significant overlap and redundancy in the captured information. To learn view-invariant embeddings, we incorporate 24 views, referred to as the selection vector, in a random selection. Our ViewSparsifier approach won both tasks. For further improvement and as a direction for future research, we also experimented with randomized view selection across all five height levels (120 views total), referred to as selection matrices.