🤖 AI Summary
This work proposes a hybrid reinforcement learning architecture inspired by human neural mechanisms to address the challenge autonomous agents face in uncertain environments—balancing rapid responsiveness with goal-directed planning. Integrating Pavlovian conditioning, model-free instrumental learning, and model-based reasoning, the framework leverages environmental spatial features as conditioned stimuli to generate intrinsic value signals. A motivation-modulated Bayesian arbitration mechanism dynamically coordinates these strategies based on contextual uncertainty. Experimental results demonstrate that the approach significantly accelerates learning, enhances navigation safety, reduces unproductive exploration in high-uncertainty regions, and enables a smooth transition from exploratory behavior to planning-driven control.
📝 Abstract
Autonomous agents operating in uncertain environments must balance fast responses with goal-directed planning. Classical MF RL often converges slowly and may induce unsafe exploration, whereas MB methods are computationally expensive and sensitive to model mismatch. This paper presents a human-inspired hybrid RL architecture integrating Pavlovian, Instrumental MF, and Instrumental MB components. Inspired by Pavlovian and Instrumental learning from neuroscience, the framework considers contextual radio cues, here intended as georeferenced environmental features acting as CS, to shape intrinsic value signals and bias decision-making. Learning is further modulated by internal motivational drives through a dedicated motivational signal. A Bayesian arbitration mechanism adaptively blends MF and MB estimates based on predicted reliability. Simulation results show that the hybrid approach accelerates learning, improves operational safety, and reduces navigation in high-uncertainty regions compared to standard RL baselines. Pavlovian conditioning promotes safer exploration and faster convergence, while arbitration enables a smooth transition from exploration to efficient, plan-driven exploitation. Overall, the results highlight the benefits of biologically inspired modularity for robust and adaptive autonomous systems under uncertainty.