🤖 AI Summary
This study addresses the challenge of inferring biologically plausible learning rules directly from animal behavioral data in zero-shot learning tasks—without imposing prior parametric assumptions (e.g., Q-learning) or restricting analyses to simplified paradigms (e.g., multi-armed bandits). We propose a nonparametric deep learning framework: a feedforward deep neural network (DNN) models trial-wise policy updates, extended into a recurrent neural network (RNN) to capture non-Markovian, history-dependent dynamics. For the first time, our approach enables data-driven identification of asymmetric update mechanisms—differential adaptation following correct versus error feedback—and complex stimulus integration. Validated on synthetic data, the model accurately recovers ground-truth learning rules. Applied to large-scale mouse perceptual decision-making datasets, it significantly outperforms conventional models in behavioral prediction. Our results reveal intrinsic history sensitivity and update asymmetry as fundamental features of biological learning.
📝 Abstract
Understanding how animals learn is a central challenge in neuroscience, with growing relevance to the development of animal- or human-aligned artificial intelligence. However, most existing approaches assume specific parametric forms for the learning rule (e.g., Q-learning, policy gradient) or are limited to simplified settings like bandit tasks, which do not involve learning a new input-output mapping from scratch. In contrast, animals must often learn new behaviors de novo, which poses a rich challenge for learning-rule inference. We target this problem by inferring learning rules directly from animal decision-making data during de novo task learning, a setting that requires models flexible enough to capture suboptimality, history dependence, and rich external stimulus integration without strong structural priors. We first propose a nonparametric framework that parameterizes the per-trial update of policy weights with a deep neural network (DNN), and validate it by recovering ground-truth rules in simulation. We then extend to a recurrent variant (RNN) that captures non-Markovian dynamics by allowing updates to depend on trial history. Applied to a large behavioral dataset of mice learning a sensory decision-making task over multiple weeks, our models improved predictions on held-out data. The inferred rules revealed asymmetric updates after correct versus error trials and history dependence, consistent with non-Markovian learning. Overall, these results introduce a flexible framework for inferring biological learning rules from behavioral data in de novo learning tasks, providing insights to inform experimental training protocols and the development of behavioral digital twins.