๐ค AI Summary
This study addresses the challenge of severe noise in dense disparity maps within complex forest canopies, which hinders branch-level 3D reconstruction and limits autonomous drone-based pruning. To overcome this, we propose a progressive pipeline that integrates DEFOM-Stereo disparity estimation, SAM3 instance segmentation, and multi-stage depth refinement to sequentially mitigate mask boundary contamination, segmentation errors, and depth noise. Key innovations include skeleton-preserving mask refinement, LAB-space Mahalanobis distanceโbased color validation, and a five-stage robust depth optimization strategy. Evaluated on UAV imagery of New Zealand radiata pine stands, our method reduces per-branch depth standard deviation by 82%, substantially improving geometric consistency and edge fidelity, and yields high-quality point clouds suitable for precise pruning localization.
๐ Abstract
Accurate per-branch 3D reconstruction is a prerequisite for autonomous UAV-based tree pruning; however, dense disparity maps from modern stereo matchers often remain too noisy for individual branch analysis in complex forest canopies. This paper introduces a progressive pipeline integrating DEFOM-Stereo foundation-model disparity estimation, SAM3 instance segmentation, and multi-stage depth optimization to deliver robust per-branch point clouds. Starting from a naive baseline, we systematically identify and resolve three error families through successive refinements. Mask boundary contamination is first addressed through morphological erosion and subsequently refined via a skeleton-preserving variant to safeguard thin-branch topology. Segmentation inaccuracy is then mitigated using LAB-space Mahalanobis color validation coupled with cross-branch overlap arbitration. Finally, depth noise - the most persistent error source - is initially reduced by outlier removal and median filtering, before being superseded by a robust five-stage scheme comprising MAD global detection, spatial density consensus, local MAD filtering, RGB-guided filtering, and adaptive bilateral filtering. Evaluated on 1920x1080 stereo imagery of Radiata pine (Pinus radiata) acquired with a ZED Mini camera (63 mm baseline) from a UAV in Canterbury, New Zealand, the proposed pipeline reduces the average per-branch depth standard deviation by 82% while retaining edge fidelity. The result is geometrically coherent 3D point clouds suitable for autonomous pruning tool positioning. All code and processed data are publicly released to facilitate further UAV forestry research.