Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images

📅 2025-03-23

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address the need for real-time, accurate scene understanding in low-altitude unstructured environments for autonomous UAV navigation, this paper proposes a lightweight end-to-end joint learning framework that simultaneously performs semantic segmentation and monocular depth estimation from aerial imagery. The architecture employs a shared encoder with task-specific decoders, integrating multi-scale feature fusion and cross-task attention mechanisms to balance accuracy and efficiency. Evaluated on the MidAir and AeroScapes benchmarks, our method achieves state-of-the-art accuracy while maintaining an inference speed of 20.2 FPS and low GPU memory consumption. It outperforms both single-task baselines and existing joint-learning approaches in both accuracy and computational efficiency. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

Understanding the geometric and semantic properties of the scene is crucial in autonomous navigation and particularly challenging in the case of Unmanned Aerial Vehicle (UAV) navigation. Such information may be by obtained by estimating depth and semantic segmentation maps of the surrounding environment and for their practical use in autonomous navigation, the procedure must be performed as close to real-time as possible. In this paper, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low-altitude unstructured environments. We propose a joint deep-learning architecture that can perform the two tasks accurately and rapidly, and validate its effectiveness on MidAir and Aeroscapes benchmark datasets. Our joint-architecture proves to be competitive or superior to the other single and joint architecture methods while performing its task fast predicting 20.2 FPS on a single NVIDIA quadro p5000 GPU and it has a low memory footprint. All codes for training and prediction can be found on this link: https://github.com/Malga-Vision/Co-SemDepth

Problem

Research questions and friction points this paper is trying to address.

Joint semantic segmentation and depth estimation for UAV navigation

Fast real-time processing for autonomous aerial systems

Low-altitude unstructured environment mapping using monocular cameras

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint deep-learning for depth and semantic segmentation

Fast prediction at 20.2 FPS on P5000 GPU

Low memory footprint for UAV navigation

🔎 Similar Papers

No similar papers found.