🤖 AI Summary
This study addresses the limited generalization of existing depth estimation algorithms in vegetation-dense environments and the absence of dedicated evaluation benchmarks. We present the first high-resolution stereo dataset for trunk depth estimation tailored to UAV-based forestry operations, comprising 5,313 image pairs. Without any fine-tuning, we conduct a zero-shot cross-domain evaluation of eight state-of-the-art stereo matching methods spanning diverse paradigms—including iterative optimization, foundation models, diffusion models, and 3D CNNs. Experimental results show that DEFOM achieves the best average rank of 1.75 across multiple benchmarks, including ETH3D, KITTI, Middlebury, and our newly introduced dataset, demonstrating superior performance in trunk-centric scenes. Consequently, DEFOM is established as the gold-standard baseline for depth estimation in vegetated environments, and its predictions are provided as pseudo-ground-truth to facilitate future research.
📝 Abstract
Autonomous UAV forestry operations require robust depth estimation with strong cross-domain generalization, yet existing evaluations focus on urban and indoor scenarios, leaving a critical gap for vegetation-dense environments. We present the first systematic zero-shot evaluation of eight stereo methods spanning iterative refinement, foundation model, diffusion-based, and 3D CNN paradigms. All methods use officially released pretrained weights (trained on Scene Flow) and are evaluated on four standard benchmarks (ETH3D, KITTI 2012/2015, Middlebury) plus a novel 5,313-pair Canterbury Tree Branches dataset ($1920 \times 1080$). Results reveal scene-dependent patterns: foundation models excel on structured scenes (BridgeDepth: 0.23 px on ETH3D; DEFOM: 4.65 px on Middlebury), while iterative methods show variable cross-benchmark performance (IGEV++: 0.36 px on ETH3D but 6.77 px on Middlebury; IGEV: 0.33 px on ETH3D but 4.99 px on Middlebury). Qualitative evaluation on the Tree Branches dataset establishes DEFOM as the gold-standard baseline for vegetation depth estimation, with superior cross-domain consistency (consistently ranking 1st-2nd across benchmarks, average rank 1.75). DEFOM predictions will serve as pseudo-ground-truth for future benchmarking.