๐ค AI Summary
This paper extends Schmidt-Hieber (2020)โs minimax optimality result for deep neural networks (DNNs) โ originally established under i.i.d. assumptions โ to non-i.i.d. time-series data (e.g., Markov chains) and generalized supervised learning tasks. Method: It introduces the first PAC-Bayesian risk upper bound for DNNs under dependent data, integrating a Paulin-type Bernstein inequality, generalized Bayesian estimation, and pseudospectral gap analysis of Markov chains to uniformly handle least-squares and logistic regression. Contribution/Results: The derived risk bound matches the information-theoretic lower bound up to logarithmic factors. It recovers Schmidt-Hieberโs optimal rate for regression and, for the first time, establishes a matching lower bound under logistic loss. This rigorously confirms the minimax optimality of DNNs in learning from dependent data.
๐ Abstract
In a groundbreaking work, Schmidt-Hieber (2020) proved the minimax optimality of deep neural networks with ReLu activation for least-square regression estimation over a large class of functions defined by composition. In this paper, we extend these results in many directions. First, we remove the i.i.d. assumption on the observations, to allow some time dependence. The observations are assumed to be a Markov chain with a non-null pseudo-spectral gap. Then, we study a more general class of machine learning problems, which includes least-square and logistic regression as special cases. Leveraging on PAC-Bayes oracle inequalities and a version of Bernstein inequality due to Paulin (2015), we derive upper bounds on the estimation risk for a generalized Bayesian estimator. In the case of least-square regression, this bound matches (up to a logarithmic factor) the lower bound of Schmidt-Hieber (2020). We establish a similar lower bound for classification with the logistic loss, and prove that the proposed DNN estimator is optimal in the minimax sense.