🤖 AI Summary
To address the challenges of scarce training data and complex field conditions hindering efficient fine-tuning of Segment Anything Model (SAM) in agricultural settings, this paper proposes Dynamic Similarity Graph Adaptation (DSGA), tailored for foreground and instance segmentation of small, densely packed crop targets—e.g., chickpea pods. DSGA integrates dynamic similarity graph construction, learnable weight ranking, adaptive local feature aggregation, and LoRA-based low-rank parameter updates, jointly modeling global dependencies and local details with only 4.00M trainable parameters. A learnable polynomial decay initialization is introduced to enhance convergence stability, while Grad-CAM and t-SNE enable interpretable analysis. On the chickpea pod dataset, DSGA achieves a 17.31% improvement in structural measure and a 62.36% gain in adaptive F-measure under 2–10-shot settings, with counting correlation reaching an adjusted R² of 0.8987—substantially outperforming existing PEFT methods.
📝 Abstract
Parameter-Efficient Fine-Tuning (PEFT) of foundation models for agricultural computer vision tasks remains challenging due to limited training data and complex field conditions. This study introduces a Dynamic Similarity-based Graph Adaptation (DSGA) module to adapt the Segment Anything Model (SAM) under extreme data constraints for precise foreground and instance segmentation of small dense objects in complex agricultural environments. Through dynamic similarity graph construction with a learnable polynomial decay-initialized weight ranking mechanism and adaptive local feature aggregation, DSGA establishes robust spatial and dynamic similarity representation with only 4.00M trainable parameters, which is 4.26% of the original SAM. Integrating this graph-based feature adaptation with Low-Rank Adaptation (LoRA) creates a complementary optimization framework that effectively captures both local and global dependencies in image embeddings while preserving model stability and parameter efficiency. Experimental results on a challenging chickpea pod dataset demonstrated that DSGA with LoRA achieved superior performance across multiple metrics evaluated under 2, 4, 8 and 10 shots, with progressive performance gains as shot count increased. Quantitative metrics showed a 17.31% improvement in Structure-measure and a 62.36% gain in adaptive F-measure compared to the baseline SAM fine-tuning. Comprehensive ablation studies and visualization analyses through Grad-CAM and t-SNE validated the framework's effectiveness in feature discrimination. The proposed adaptation demonstrated practical utility for automated agricultural monitoring applications, achieving accurate pod-counting with an adjusted R-squared of 0.8987 for images with 10 to 120 pods under challenging field conditions.