Pre-trained Large Language Models Learn Hidden Markov Models In-context

📅 2025-06-08

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work investigates whether pretrained large language models (LLMs) can perform zero-shot modeling of hidden Markov processes (HMMs) via in-context learning (ICL), circumventing the high computational cost of conventional HMM parameter estimation. Method: We generate synthetic HMM sequences and conduct both theoretical error-bound analysis and empirical evaluation on cross-species animal decision-making behaviors. Contribution/Results: We provide the first empirical evidence that ICL implicitly learns HMM structure; discover a novel ICL scaling law governed by HMM parameters—including number of states and transition entropy; and propose ICL as a new paradigm for diagnosing latent structure in scientific data. Experiments show that ICL achieves prediction accuracy approaching the theoretical optimum across diverse synthetic HMMs, matches expert-designed models on real-world animal decision tasks, and establishes robust theoretical and empirical foundations for ICL-based implicit modeling of state-space dynamics.

Technology Category

Application Category

📝 Abstract

Hidden Markov Models (HMMs) are foundational tools for modeling sequential data with latent Markovian structure, yet fitting them to real-world data remains computationally challenging. In this work, we show that pre-trained large language models (LLMs) can effectively model data generated by HMMs via in-context learning (ICL)$unicode{x2013}$their ability to infer patterns from examples within a prompt. On a diverse set of synthetic HMMs, LLMs achieve predictive accuracy approaching the theoretical optimum. We uncover novel scaling trends influenced by HMM properties, and offer theoretical conjectures for these empirical observations. We also provide practical guidelines for scientists on using ICL as a diagnostic tool for complex data. On real-world animal decision-making tasks, ICL achieves competitive performance with models designed by human experts. To our knowledge, this is the first demonstration that ICL can learn and predict HMM-generated sequences$unicode{x2013}$an advance that deepens our understanding of in-context learning in LLMs and establishes its potential as a powerful tool for uncovering hidden structure in complex scientific data.

Problem

Research questions and friction points this paper is trying to address.

LLMs model HMM-generated data via in-context learning

LLMs approach theoretical optimum in HMM predictive accuracy

ICL serves as diagnostic tool for complex hidden structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs model HMMs via in-context learning

Achieve predictive accuracy near theoretical optimum

ICL as diagnostic tool for complex data

🔎 Similar Papers

No similar papers found.