Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images

πŸ“… 2025-03-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the need for real-time, accurate scene understanding in low-altitude unstructured environments for autonomous UAV navigation, this paper proposes a lightweight end-to-end joint learning framework that simultaneously performs semantic segmentation and monocular depth estimation from aerial imagery. The architecture employs a shared encoder with task-specific decoders, integrating multi-scale feature fusion and cross-task attention mechanisms to balance accuracy and efficiency. Evaluated on the MidAir and AeroScapes benchmarks, our method achieves state-of-the-art accuracy while maintaining an inference speed of 20.2 FPS and low GPU memory consumption. It outperforms both single-task baselines and existing joint-learning approaches in both accuracy and computational efficiency. The source code is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Understanding the geometric and semantic properties of the scene is crucial in autonomous navigation and particularly challenging in the case of Unmanned Aerial Vehicle (UAV) navigation. Such information may be by obtained by estimating depth and semantic segmentation maps of the surrounding environment and for their practical use in autonomous navigation, the procedure must be performed as close to real-time as possible. In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low-altitude unstructured environments. We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly, and validate its effectiveness on MidAir and Aeroscapes benchmark datasets. Our joint-architecture proves to be competitive or superior to the other single and joint architecture methods while performing its task fast predicting 20.2 FPS on a single NVIDIA quadro p5000 GPU and it has a low memory footprint. All codes for training and prediction can be found on this link: https://github.com/Malga-Vision/Co-SemDepth
Problem

Research questions and friction points this paper is trying to address.

Joint semantic segmentation and depth estimation for UAV navigation
Fast real-time processing for autonomous aerial systems
Low-altitude unstructured environment mapping using monocular cameras
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint deep-learning for depth and semantic segmentation
Fast prediction at 20.2 FPS on P5000 GPU
Low memory footprint for UAV navigation
πŸ”Ž Similar Papers
No similar papers found.