Central Limit Theorems for Transition Probabilities of Controlled Markov Chains

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This paper establishes the asymptotic normality of nonparametric estimators for transition probability matrices in controlled Markov chains (CMCs) with finite state-action spaces. It provides the first necessary and sufficient conditions for the central limit theorem (CLT) to hold—revealing that it may fail under insufficient policy exploration or loss of chain irreducibility. Building upon this, the paper derives joint asymptotic normality for value functions, Q-functions, and advantage functions, enabling asymptotically valid offline policy evaluation and inference on optimal policies. Methodologically, it integrates empirical process theory, Markov chain stability analysis, and nonparametric statistical inference. Key contributions include: (1) the first precise characterization of the CLT for CMC transition estimators; (2) a unified asymptotic distribution theory for general policy function classes; and (3) a residual-based empirical goodness-of-fit testing framework for diagnosing data randomness and validating model assumptions.

Technology Category

Application Category

📝 Abstract

We develop a central limit theorem (CLT) for the non-parametric estimator of the transition matrices in controlled Markov chains (CMCs) with finite state-action spaces. Our results establish precise conditions on the logging policy under which the estimator is asymptotically normal, and reveal settings in which no CLT can exist. We then build upon it to derive CLTs for the value, Q-, and advantage functions of any stationary stochastic policy, including the optimal policy recovered from the estimated model. Goodness-of-fit tests are derived as a corollary, which enable us to test whether the logged data is stochastic. These results provide new statistical tools for offline policy evaluation and optimal policy recovery, and enable hypothesis tests for transition probabilities.

Problem

Research questions and friction points this paper is trying to address.

Develop CLT for transition matrices in controlled Markov chains

Establish conditions for asymptotic normality of non-parametric estimator

Derive CLTs for value functions and goodness-of-fit tests

Innovation

Methods, ideas, or system contributions that make the work stand out.

CLT for non-parametric transition matrix estimator

Derive CLTs for value, Q-, advantage functions

Goodness-of-fit tests for logged data

🔎 Similar Papers

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning