🤖 AI Summary
Recursive likelihood models suffer from high inference costs and poor scalability in large datasets. This work proposes a stable weighted subsampling method that accelerates inference by constructing unbiased estimators of the log-likelihood and its gradient. The approach prioritizes early observations to reduce recursion depth and incorporates a stabilization framework with decaying sampling probabilities, effectively balancing variance control and computational efficiency. The method seamlessly integrates into stochastic optimization, variational Bayes, and MCMC algorithms. Experiments on GARCH-type conditional volatility models demonstrate that the proposed technique significantly outperforms uniform subsampling and existing data-dependent stochastic gradient and divide-and-conquer MCMC approaches, achieving substantial computational speedup while preserving inference accuracy.
📝 Abstract
Inference for models with recursively defined likelihoods is computationally demanding, limiting scalability to large datasets. We propose a stabilised weighted subsampling methodology for accelerated inference based on an unbiased estimator of the log-likelihood. By assigning higher sampling probabilities to early observations, the method reduces the effective depth of recursive likelihood evaluations and hence expected computational cost. However, slow decay leads to frequent inclusion of late observations and high computational cost, while overly aggressive decay can substantially inflate estimator variance. We develop a stabilisation framework, underpinned by theoretical results, that restricts the decay of the sampling probabilities to avoid both variance and computational pathologies through principled hyperparameter tuning. We further consider an unbiased subsampling estimator of the log-likelihood gradient, enabling gradient-based inference. The proposed estimators are generic building blocks for subsampling-based inference and can be embedded within frameworks including stochastic optimisation, variational Bayes, and Markov chain Monte Carlo. Applications to conditional volatility models, including standard and threshold generalised autoregressive conditional heteroskedasticity models, demonstrate substantial computational speed-ups while maintaining inferential accuracy. The proposed approach outperforms uniform subsampling and compares favourably with recent stochastic gradient and divide-and-conquer MCMC methods for dependent data.