🤖 AI Summary
Real-time semantic navigation for resource-constrained autonomous aerial vehicles faces a fundamental trade-off between detection accuracy and energy efficiency, primarily due to the high computational cost of conventional 3D point cloud–based object detection.
Method: This paper proposes a lightweight semantic-guided navigation framework that eliminates expensive 3D reconstruction and instead introduces a novel Markov Decision Process (MDP) model integrating semantic perception and navigation decision-making. It directly maps video-based semantic cues—such as traffic signs—into navigability criteria, enabled by a feature-engineered lightweight detector and online policy optimization for closed-loop control.
Contribution/Results: Evaluated on simulation and hardware-in-the-loop (HIL) platforms, the framework achieves a 58% reduction in inference latency and a 42% decrease in system energy consumption compared to baseline methods, while maintaining navigation accuracy within a marginal drop of ≤2.3%. The approach significantly improves real-time performance and energy efficiency without compromising functional reliability.
📝 Abstract
Most applications in autonomous navigation using mounted cameras rely on the construction and processing of geometric 3D point clouds, which is an expensive process. However, there is another simpler way to make a space navigable quickly: to use semantic information (e.g., traffic signs) to guide the agent. However, detecting and acting on semantic information involves Computer Vision~(CV) algorithms such as object detection, which themselves are demanding for agents such as aerial drones with limited onboard resources. To solve this problem, we introduce a novel Markov Decision Process~(MDP) framework to reduce the workload of these CV approaches. We apply our proposed framework to both feature-based and neural-network-based object-detection tasks, using open-loop and closed-loop simulations as well as hardware-in-the-loop emulations. These holistic tests show significant benefits in energy consumption and speed with only a limited loss in accuracy compared to models based on static features and neural networks.