Learning What Matters Now: A Dual-Critic Context-Aware RL Framework for Priority-Driven Information Gain

📅 2025-06-07

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address the challenge of autonomously acquiring critical information under dynamic priority shifts in high-risk search-and-rescue (SAR) scenarios, this paper proposes CA-MIQ, a lightweight dual-critic reinforcement learning framework. Methodologically, it integrates dual-critic Q-learning, intrinsic reward modeling, state novelty estimation, priority-aware information gain measurement, and piecewise-stationary distribution adaptation. Its key contributions are: (1) an intrinsic critic that jointly models state novelty, information location awareness, and real-time priority alignment; and (2) a priority shift detection mechanism that triggers instantaneous exploration enhancement and selective critic reset. Experiments in multi-priority-switching simulations demonstrate that CA-MIQ achieves nearly a fourfold improvement in task success rate after a single priority switch and maintains 100% recovery under repeated switches—whereas baseline methods fail completely.

Technology Category

Application Category

📝 Abstract

Autonomous systems operating in high-stakes search-and-rescue (SAR) missions must continuously gather mission-critical information while flexibly adapting to shifting operational priorities. We propose CA-MIQ (Context-Aware Max-Information Q-learning), a lightweight dual-critic reinforcement learning (RL) framework that dynamically adjusts its exploration strategy whenever mission priorities change. CA-MIQ pairs a standard extrinsic critic for task reward with an intrinsic critic that fuses state-novelty, information-location awareness, and real-time priority alignment. A built-in shift detector triggers transient exploration boosts and selective critic resets, allowing the agent to re-focus after a priority revision. In a simulated SAR grid-world, where experiments specifically test adaptation to changes in the priority order of information types the agent is expected to focus on, CA-MIQ achieves nearly four times higher mission-success rates than baselines after a single priority shift and more than three times better performance in multiple-shift scenarios, achieving 100% recovery while baseline methods fail to adapt. These results highlight CA-MIQ's effectiveness in any discrete environment with piecewise-stationary information-value distributions.

Problem

Research questions and friction points this paper is trying to address.

Adapting exploration strategy to shifting mission priorities in SAR

Dynamically balancing task reward and priority-aligned information gain

Achieving robust performance in discrete environments with changing priorities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-critic RL framework for dynamic adaptation

Intrinsic critic fuses novelty and priority alignment

Shift detector triggers exploration boosts and resets

🔎 Similar Papers

Retrieval-Augmented Decision Transformer: External Memory for In-context RL