Efficient Spatio-Temporal Vegetation Pixel Classification with Vision Transformers

📅 2026-04-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This study addresses the challenges of low computational efficiency and poor model scalability in pixel-level cross-temporal classification for high-resolution vegetation monitoring. To this end, the authors systematically optimize the Vision Transformer architecture and, for the first time, demonstrate its efficiency and scalability in vegetation phenology monitoring. Through comprehensive design explorations across seven dimensions—including data normalization, spectral ordering, boundary handling, spatial windowing, tokenization strategy, positional encoding, and feature aggregation—and supported by multidimensional ablation studies, the proposed method is validated on UAV and ground-based imagery from the Brazilian Cerrado biome. Compared to multi-temporal CNN baselines, the approach reduces FLOPs by an order of magnitude while maintaining competitive classification performance, and its parameter count remains constant regardless of sequence length, making it well-suited for resource-constrained phenological monitoring scenarios.
📝 Abstract
Plant phenology-the study of recurrent life cycle events-is essential for understanding ecosystem dynamics and their responses to climate change impacts. While Unmanned Aerial Vehicles (UAVs) and near-surface cameras enable high-resolution monitoring, identifying plant species across time remains computationally challenging. State-of-the-art approaches, specifically Multi-Temporal Convolutional Networks (CNNs), rely on rigid multi-branch architectures that scale poorly with longer time series and require large spatial context windows. In this paper, we present an extensive study on optimizing Vision Transformers (ViTs) for efficient spatio-temporal vegetation pixel classification. We conducted a comprehensive ablation study analyzing seven key design dimensions, including: (i) data normalization; (ii) spectral arrangement; (iii) boundary handling; (iv) spatial context window shape and size; (v) tokenization strategies; (vi) positional encoding; and (vii) feature aggregation strategies. Our method was evaluated on two datasets from the Brazilian Cerrado biome, Serra do Cipó (aerial imagery) and Itirapina (near-surface imagery). Experimental results demonstrate that our ViT approach offers a substantial improvement in computational efficiency while maintaining competitive classification performance. Notably, our ViT reduces Floating Point Operations (FLOPs) by an order of magnitude and maintains constant parameter complexity regardless of the time series length, whereas the CNN baseline scales linearly. Our findings confirm that ViTs are a robust, scalable solution for resource-constrained phenological monitoring systems.
Problem

Research questions and friction points this paper is trying to address.

spatio-temporal classification
vegetation monitoring
plant phenology
computational efficiency
time series scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Transformers
spatio-temporal classification
vegetation phenology
computational efficiency
UAV imagery
🔎 Similar Papers
No similar papers found.
A
Alan Gomes
Department of Computing, Federal University of São Carlos (UFSCar), Sorocaba, 18052-780, Brazil
A
Anderson Gonçalves
Department of Computing, Federal University of São Carlos (UFSCar), Sorocaba, 18052-780, Brazil
S
Samuel Felipe dos Santos
Department of Computing, Federal University of São Carlos (UFSCar), Sorocaba, 18052-780, Brazil
N
Nathan Felipe Alves
Center for Research on Biodiversity Dynamics and Climate Change and Department of Biodiversity, Bioscience Institute, São Paulo State University (UNESP), Rio Claro, 13506-900, Brazil
M
Magna Soelma Beserra de Moura
Brazilian Agricultural Research Corporation (EMBRAPA), Fortaleza, 60511-110, Brazil
B
Bruna de Costa Alberton
Center for Research on Biodiversity Dynamics and Climate Change and Department of Biodiversity, Bioscience Institute, São Paulo State University (UNESP), Rio Claro, 13506-900, Brazil
L
Leonor Patricia C. Morellato
Center for Research on Biodiversity Dynamics and Climate Change and Department of Biodiversity, Bioscience Institute, São Paulo State University (UNESP), Rio Claro, 13506-900, Brazil
Ricardo da Silva Torres
Ricardo da Silva Torres
Professor in Data Science and Artificial Intelligence, Wageningen University & Research
multimedia analysismultimedia retrievalmachine learninginformation visualizationdatabases
Jurandy Almeida
Jurandy Almeida
Associate Professor, Federal University of São Carlos (UFSCAR)
DatabasesImage ProcessingComputer VisionInformation Retrieval