Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds

📅 2024-10-29

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper extends Schmidt-Hieber (2020)’s minimax optimality result for deep neural networks (DNNs) — originally established under i.i.d. assumptions — to non-i.i.d. time-series data (e.g., Markov chains) and generalized supervised learning tasks. Method: It introduces the first PAC-Bayesian risk upper bound for DNNs under dependent data, integrating a Paulin-type Bernstein inequality, generalized Bayesian estimation, and pseudospectral gap analysis of Markov chains to uniformly handle least-squares and logistic regression. Contribution/Results: The derived risk bound matches the information-theoretic lower bound up to logarithmic factors. It recovers Schmidt-Hieber’s optimal rate for regression and, for the first time, establishes a matching lower bound under logistic loss. This rigorously confirms the minimax optimality of DNNs in learning from dependent data.

Technology Category

Application Category

📝 Abstract

In a groundbreaking work, Schmidt-Hieber (2020) proved the minimax optimality of deep neural networks with ReLu activation for least-square regression estimation over a large class of functions defined by composition. In this paper, we extend these results in many directions. First, we remove the i.i.d. assumption on the observations, to allow some time dependence. The observations are assumed to be a Markov chain with a non-null pseudo-spectral gap. Then, we study a more general class of machine learning problems, which includes least-square and logistic regression as special cases. Leveraging on PAC-Bayes oracle inequalities and a version of Bernstein inequality due to Paulin (2015), we derive upper bounds on the estimation risk for a generalized Bayesian estimator. In the case of least-square regression, this bound matches (up to a logarithmic factor) the lower bound of Schmidt-Hieber (2020). We establish a similar lower bound for classification with the logistic loss, and prove that the proposed DNN estimator is optimal in the minimax sense.

Problem

Research questions and friction points this paper is trying to address.

Extends deep neural network optimality to dependent data

Generalizes results beyond i.i.d. to Markov chain observations

Proves minimax optimality for logistic and least-square regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends DNN optimality to dependent data

Uses PAC-Bayes bounds for risk estimation

Generalizes to logistic and least-square regression

🔎 Similar Papers

Empirical Tests of Optimization Assumptions in Deep Learning

2024-07-01arXiv.orgCitations: 4

Variational Stochastic Gradient Descent for Deep Neural Networks

2024-04-09arXiv.orgCitations: 0

💼 Related Jobs

Software Engineer, Machine Learning