Towards Gold-Standard Depth Estimation for Tree Branches in UAV Forestry: Benchmarking Deep Stereo Matching Methods

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited generalization of existing depth estimation algorithms in vegetation-dense environments and the absence of dedicated evaluation benchmarks. We present the first high-resolution stereo dataset for trunk depth estimation tailored to UAV-based forestry operations, comprising 5,313 image pairs. Without any fine-tuning, we conduct a zero-shot cross-domain evaluation of eight state-of-the-art stereo matching methods spanning diverse paradigms—including iterative optimization, foundation models, diffusion models, and 3D CNNs. Experimental results show that DEFOM achieves the best average rank of 1.75 across multiple benchmarks, including ETH3D, KITTI, Middlebury, and our newly introduced dataset, demonstrating superior performance in trunk-centric scenes. Consequently, DEFOM is established as the gold-standard baseline for depth estimation in vegetated environments, and its predictions are provided as pseudo-ground-truth to facilitate future research.

Technology Category

Application Category

📝 Abstract
Autonomous UAV forestry operations require robust depth estimation with strong cross-domain generalization, yet existing evaluations focus on urban and indoor scenarios, leaving a critical gap for vegetation-dense environments. We present the first systematic zero-shot evaluation of eight stereo methods spanning iterative refinement, foundation model, diffusion-based, and 3D CNN paradigms. All methods use officially released pretrained weights (trained on Scene Flow) and are evaluated on four standard benchmarks (ETH3D, KITTI 2012/2015, Middlebury) plus a novel 5,313-pair Canterbury Tree Branches dataset ($1920 \times 1080$). Results reveal scene-dependent patterns: foundation models excel on structured scenes (BridgeDepth: 0.23 px on ETH3D; DEFOM: 4.65 px on Middlebury), while iterative methods show variable cross-benchmark performance (IGEV++: 0.36 px on ETH3D but 6.77 px on Middlebury; IGEV: 0.33 px on ETH3D but 4.99 px on Middlebury). Qualitative evaluation on the Tree Branches dataset establishes DEFOM as the gold-standard baseline for vegetation depth estimation, with superior cross-domain consistency (consistently ranking 1st-2nd across benchmarks, average rank 1.75). DEFOM predictions will serve as pseudo-ground-truth for future benchmarking.
Problem

Research questions and friction points this paper is trying to address.

depth estimation
UAV forestry
stereo matching
cross-domain generalization
vegetation-dense environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot depth estimation
stereo matching
foundation model
vegetation dense environment
UAV forestry
🔎 Similar Papers
No similar papers found.
Y
Yida Lin
Centre for Data Science and Artificial Intelligence, Victoria University of Wellington, Wellington, New Zealand
Bing Xue
Bing Xue
Meta Superintelligence Labs
LLMmachine learning for healthcarerepresentation learninggenerative models
M
Mengjie Zhang
Centre for Data Science and Artificial Intelligence, Victoria University of Wellington, Wellington, New Zealand
S
Sam Schofield
Department of Computer Science and Software Engineering, University of Canterbury, Canterbury, New Zealand
R
Richard Green
Department of Computer Science and Software Engineering, University of Canterbury, Canterbury, New Zealand