Parameter-Efficient Fine-Tuning of Vision Foundation Model for Forest Floor Segmentation from UAV Imagery

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing challenges in forest floor segmentation from UAV remote sensing imagery—including high natural variability, ambiguous annotations, and ill-defined class boundaries—this work pioneers the adaptation of the Segment Anything Model (SAM) to fine-grained ground-cover categories (e.g., stumps, understory vegetation, woody debris). We propose a parameter-efficient fine-tuning (PEFT)-based lightweight framework featuring an adapter-enhanced SAM mask decoder, enabling zero-shot, class-aware automatic segmentation. Innovatively, we customize the mask decoder using two complementary PEFT strategies—Adapter and LoRA—and perform end-to-end training on multi-temporal UAV forest imagery. Experimental results demonstrate that the Adapter variant achieves the highest mIoU, while LoRA attains competitive performance with significantly fewer parameters and reduced computational overhead, thereby satisfying stringent requirements for onboard edge deployment.

Technology Category

Application Category

📝 Abstract
Unmanned Aerial Vehicles (UAVs) are increasingly used for reforestation and forest monitoring, including seed dispersal in hard-to-reach terrains. However, a detailed understanding of the forest floor remains a challenge due to high natural variability, quickly changing environmental parameters, and ambiguous annotations due to unclear definitions. To address this issue, we adapt the Segment Anything Model (SAM), a vision foundation model with strong generalization capabilities, to segment forest floor objects such as tree stumps, vegetation, and woody debris. To this end, we employ parameter-efficient fine-tuning (PEFT) to fine-tune a small subset of additional model parameters while keeping the original weights fixed. We adjust SAM's mask decoder to generate masks corresponding to our dataset categories, allowing for automatic segmentation without manual prompting. Our results show that the adapter-based PEFT method achieves the highest mean intersection over union (mIoU), while Low-rank Adaptation (LoRA), with fewer parameters, offers a lightweight alternative for resource-constrained UAV platforms.
Problem

Research questions and friction points this paper is trying to address.

Segment forest floor objects from UAV imagery
Address high variability and ambiguous annotations
Adapt SAM with parameter-efficient fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient fine-tuning of SAM
Adapter-based PEFT for highest mIoU
LoRA for lightweight UAV adaptation
🔎 Similar Papers
No similar papers found.
M
Mohammad Wasil
Department of Computer Science, Institute for Artificial Intelligence and Autonomous Systems (A2S), Bonn-Rhein-Sieg University of Applied Sciences, Sankt Augustin, Germany
A
Ahmad Drak
Department of Computer Science, Institute of Technology, Resource and Energy-efficient Engineering (TREE), Bonn-Rhein-Sieg University of Applied Sciences, Sankt Augustin, Germany
B
Brennan Penfold
Department of Computer Science, Institute of Technology, Resource and Energy-efficient Engineering (TREE), Bonn-Rhein-Sieg University of Applied Sciences, Sankt Augustin, Germany
L
Ludovico Scarton
Department of Computer Science, Institute for Artificial Intelligence and Autonomous Systems (A2S), Institute of Technology, Resource and Energy-efficient Engineering (TREE), Bonn-Rhein-Sieg University of Applied Sciences, Sankt Augustin, Germany
M
Maximilian Johenneken
Department of Computer Science, Institute of Technology, Resource and Energy-efficient Engineering (TREE), Bonn-Rhein-Sieg University of Applied Sciences, Sankt Augustin, Germany
Alexander Asteroth
Alexander Asteroth
Professor of Computer Science, Bonn-Rhein-Sieg University of Applied Sciences
Machine LearningSurrogate ModelingSports InformaticsComputational Learning Theory
Sebastian Houben
Sebastian Houben
University of Applied Sciences Bonn-Rhein-Sieg
Real-time Computer VisionTrustworthy AI