🤖 AI Summary
This study investigates how environmental complexity—specifically, the validity of navigational cues and the frequency of shortcut exposure—affects representation learning in deep reinforcement learning (DRL) agents. Method: We design a human-inspired dual-path navigation paradigm in simulation, integrating DRL training, population-level neural activity analysis, trajectory encoding modeling, and behavioral quantification. Contribution/Results: (1) Spatial representations emerge prior to policy convergence and require representational stability as a foundation; (2) population neural activity preferentially encodes planned trajectories over instantaneous positions; (3) active cue utilization fosters functional representation development more effectively than passive cue exposure. Agents with high shortcut exposure exhibit significantly accelerated shortcut acquisition, achieving optimal closed-loop shortcut performance universally. This work is the first to demonstrate, within a DRL framework, the critical roles of representational stability and trajectory-level population coding in human-like navigation.
📝 Abstract
We developed a simulated environment to train deep reinforcement learning agents on a shortcut usage navigation task, motivated by the Dual Solutions Paradigm test used for human navigators. We manipulated the frequency with which agents were exposed to a shortcut and a navigation cue, to investigate how these factors influence shortcut usage development. We find that all agents rapidly achieve optimal performance in closed shortcut trials once initial learning starts. However, their navigation speed and shortcut usage when it is open happen faster in agents with higher shortcut exposure. Analysis of the agents' artificial neural networks activity revealed that frequent presentation of a cue initially resulted in better encoding of the cue in the activity of individual nodes, compared to agents who encountered the cue less often. However, stronger cue representations were ultimately formed through the use of the cue in the context of navigation planning, rather than simply through exposure. We found that in all agents, spatial representations develop early in training and subsequently stabilize before navigation strategies fully develop, suggesting that having spatially consistent activations is necessary for basic navigation, but insufficient for advanced strategies. Further, using new analysis techniques, we found that the planned trajectory rather than the agent's immediate location is encoded in the agent's networks. Moreover, the encoding is represented at the population rather than the individual node level. These techniques could have broader applications in studying neural activity across populations of neurons or network nodes beyond individual activity patterns.