🤖 AI Summary
This paper addresses the problem of estimating the stationary distribution—i.e., the probability mass vector over states—of an α-mixing stochastic process from a single trajectory of length *n*, using empirical state frequencies. The estimation error is measured in total variation distance. Methodologically, we extend the WingIt estimator to α-mixing processes for the first time; propose a novel hybrid strategy combining plug-in estimation with WingIt; and derive a self-normalized concentration inequality tailored to mixing sequences, circumventing the failure of Poissonization under non-i.i.d. dependence. Theoretically, our estimator achieves universal consistency as *n* → ∞ for arbitrary finite state spaces and general α-mixing processes. It recovers existing i.i.d. results in the degenerate case and provides the first frequency-to-mass estimation framework for Markov and broader dependent processes with rigorous theoretical guarantees.
📝 Abstract
Suppose we observe a trajectory of length $n$ from an $alpha$-mixing stochastic process over a finite but potentially large state space. We consider the problem of estimating the probability mass placed by the stationary distribution of any such process on elements that occur with a certain frequency in the observed sequence. We estimate this vector of probabilities in total variation distance, showing universal consistency in $n$ and recovering known results for i.i.d. sequences as special cases. Our proposed methodology carefully combines the plug-in (or empirical) estimator with a recently-proposed modification of the Good--Turing estimator called extsc{WingIt}, which was originally developed for Markovian sequences. En route to controlling the error of our estimator, we develop new performance bounds on extsc{WingIt} and the plug-in estimator for $alpha$-mixing stochastic processes. Importantly, the extensively used method of Poissonization can no longer be applied in our non i.i.d. setting, and so we develop complementary tools -- including concentration inequalities for a natural self-normalized statistic of mixing sequences -- that may prove independently useful in the design and analysis of estimators for related problems.