Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenges of poor early-stage safety, low sample efficiency, and limited dynamic adaptability in unmanned aerial vehicle (UAV) search-and-rescue missions under constrained simulation training. To overcome these issues, the authors propose a hierarchical decision-making framework: at the high level, a rule-driven, interpretable coaching mechanism provides action recommendations, collision-avoidance strategies, and arbitration weights; at the low level, goal-conditioned reinforcement learning with dense reward shaping is combined with pattern-aware prioritized experience replay and rule-based metadata augmentation. Notably, the approach requires no pretraining and significantly improves both early safety and sample efficiency. Evaluated on battery-aware multi-target delivery and dynamic target tracking tasks, the method effectively reduces mission termination due to collisions while maintaining robust online adaptability to environmental dynamics.

📝 Abstract

This paper presents a hierarchical decision-making framework for unmanned aerial vehicle (UAV) missions motivated by search-and-rescue (SAR) scenarios under limited simulation training. The framework combines a fixed rule-based high-level advisor with an online goal-conditioned low-level reinforcement learning (RL) controller. To stress-test early adaptation, we also consider a strict no-pretraining deployment regime. The high-level advisor is defined offline from a structured task specification and compiled into deterministic rules. It provides interpretable mission- and safety-aware guidance through recommended actions, avoided actions, and regime-dependent arbitration weights. The low-level controller learns online from task-defined dense rewards and reuses experience through a mode-aware prioritized replay mechanism augmented with rule-derived metadata. We evaluate the framework on two tasks: battery-aware multi-goal delivery and moving-target delivery in obstacle-rich environments. Across both tasks, the proposed method improves early safety and sample efficiency primarily by reducing collision terminations, while preserving the ability to adapt online to scenario-specific dynamics.

Problem

Research questions and friction points this paper is trying to address.

search-and-rescue

limited-simulation training

goal-conditioned reinforcement learning

UAV missions

sample efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

rule-based coaching

goal-conditioned reinforcement learning

hierarchical decision-making