š¤ AI Summary
Neural estimatorsādeep learningābased methods for parametric statistical inferenceāexhibit strong empirical performance but have long lacked rigorous statistical theory. Their risk analysis remains challenging due to the complex interplay of model approximation, optimization, and generalization.
Method: We propose the first systematic theoretical framework for analyzing their estimation risk. Under verifiable regularity conditionsānot tied to any specific network architectureāwe decompose the risk into bias, variance, and approximation error components and establish convergence criteria for each. Our analysis integrates statistical learning theory with parametric modeling principles.
Results: We prove that, under mild assumptions, all three error terms vanish in probability. Extensive experiments on canonical statistical tasksāincluding location/scale estimation and exponential-family parameter inferenceādemonstrate consistency between theoretical convergence rates and empirical performance. This work provides the first general theoretical foundation for the reliability and asymptotic validity of neural estimators.
š Abstract
Neural estimators are simulation-based estimators for the parameters of a family of statistical models, which build a direct mapping from the sample to the parameter vector. They benefit from the versatility of available network architectures and efficient training methods developed in the field of deep learning. Neural estimators are amortized in the sense that, once trained, they can be applied to any new data set with almost no computational cost. While many papers have shown very good performance of these methods in simulation studies and real-world applications, so far no statistical guarantees are available to support these observations theoretically. In this work, we study the risk of neural estimators by decomposing it into several terms that can be analyzed separately. We formulate easy-to-check assumptions ensuring that each term converges to zero, and we verify them for popular applications of neural estimators. Our results provide a general recipe to derive theoretical guarantees also for broader classes of architectures and estimation problems.