🤖 AI Summary
To address the challenge of autonomously acquiring critical information under dynamic priority shifts in high-risk search-and-rescue (SAR) scenarios, this paper proposes CA-MIQ, a lightweight dual-critic reinforcement learning framework. Methodologically, it integrates dual-critic Q-learning, intrinsic reward modeling, state novelty estimation, priority-aware information gain measurement, and piecewise-stationary distribution adaptation. Its key contributions are: (1) an intrinsic critic that jointly models state novelty, information location awareness, and real-time priority alignment; and (2) a priority shift detection mechanism that triggers instantaneous exploration enhancement and selective critic reset. Experiments in multi-priority-switching simulations demonstrate that CA-MIQ achieves nearly a fourfold improvement in task success rate after a single priority switch and maintains 100% recovery under repeated switches—whereas baseline methods fail completely.
📝 Abstract
Autonomous systems operating in high-stakes search-and-rescue (SAR) missions must continuously gather mission-critical information while flexibly adapting to shifting operational priorities. We propose CA-MIQ (Context-Aware Max-Information Q-learning), a lightweight dual-critic reinforcement learning (RL) framework that dynamically adjusts its exploration strategy whenever mission priorities change. CA-MIQ pairs a standard extrinsic critic for task reward with an intrinsic critic that fuses state-novelty, information-location awareness, and real-time priority alignment. A built-in shift detector triggers transient exploration boosts and selective critic resets, allowing the agent to re-focus after a priority revision. In a simulated SAR grid-world, where experiments specifically test adaptation to changes in the priority order of information types the agent is expected to focus on, CA-MIQ achieves nearly four times higher mission-success rates than baselines after a single priority shift and more than three times better performance in multiple-shift scenarios, achieving 100% recovery while baseline methods fail to adapt. These results highlight CA-MIQ's effectiveness in any discrete environment with piecewise-stationary information-value distributions.