Mechanistic Interpretability of RNNs emulating Hidden Markov Models

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work addresses the theoretical gap between continuous neural dynamics in recurrent neural networks (RNNs) and the piecewise-constant latent states empirically observed in natural behavior—specifically, how RNNs can intrinsically implement discrete, stochastic dynamics akin to hidden Markov models (HMMs). Method: Combining end-to-end training with reverse-engineering analysis, the study systematically evaluates RNN generalization across fully connected, recurrent, and linear-chain HMM tasks. Contribution/Results: We identify a functional decomposition wherein RNNs perform probabilistic computation via coordinated switching between “slow noise-driven” and “fast deterministic transition” regimes. This yields modular dynamical motifs governed by a sparse set of trigger neurons. The resulting dynamics stabilize at fixed points without input, sustain noise-robust limit-cycle behavior under stochastic inputs, and precisely replicate HMM emission statistics and state-transition probabilities. These findings demonstrate that continuous RNNs can intrinsically encode discrete latent variables, establishing a novel principle for modeling population-level neural dynamics.

Technology Category

Application Category

📝 Abstract

Recurrent neural networks (RNNs) provide a powerful approach in neuroscience to infer latent dynamics in neural populations and to generate hypotheses about the neural computations underlying behavior. However, past work has focused on relatively simple, input-driven, and largely deterministic behaviors - little is known about the mechanisms that would allow RNNs to generate the richer, spontaneous, and potentially stochastic behaviors observed in natural settings. Modeling with Hidden Markov Models (HMMs) has revealed a segmentation of natural behaviors into discrete latent states with stochastic transitions between them, a type of dynamics that may appear at odds with the continuous state spaces implemented by RNNs. Here we first show that RNNs can replicate HMM emission statistics and then reverse-engineer the trained networks to uncover the mechanisms they implement. In the absence of inputs, the activity of trained RNNs collapses towards a single fixed point. When driven by stochastic input, trajectories instead exhibit noise-sustained dynamics along closed orbits. Rotation along these orbits modulates the emission probabilities and is governed by transitions between regions of slow, noise-driven dynamics connected by fast, deterministic transitions. The trained RNNs develop highly structured connectivity, with a small set of "kick neurons" initiating transitions between these regions. This mechanism emerges during training as the network shifts into a regime of stochastic resonance, enabling it to perform probabilistic computations. Analyses across multiple HMM architectures - fully connected, cyclic, and linear-chain - reveal that this solution generalizes through the modular reuse of the same dynamical motif, suggesting a compositional principle by which RNNs can emulate complex discrete latent dynamics.

Problem

Research questions and friction points this paper is trying to address.

RNNs replicate HMM emission statistics and stochastic transitions

Reverse-engineering uncovers noise-sustained dynamics and kick neurons

Generalized mechanism enables RNNs to emulate complex discrete behaviors

Innovation

Methods, ideas, or system contributions that make the work stand out.

RNNs emulate HMMs using noise-sustained closed orbits

Kick neurons initiate fast transitions between slow regions

Modular reuse of dynamical motif enables probabilistic computations

🔎 Similar Papers

A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models