Global Convergence Rate of Deep Equilibrium Models with General Activations

📅 2023-02-11
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the global convergence rate of Deep Equilibrium Models (DEQs) under general smooth activation functions—specifically, those with bounded first- and second-order derivatives. For quadratic loss, we establish, for the first time, linear convergence of gradient descent, extending prior theoretical guarantees beyond ReLU to a broad class of activations. Methodologically, we introduce a novel construction of the overall Gram matrix and develop a dual activation analysis framework by integrating Hermite polynomial expansions with nonlinear spectral analysis. Leveraging equilibrium point stability theory and over-parameterization analysis, we rigorously lower-bound the smallest eigenvalue of the Gram matrix under non-homogeneous activations. Our analysis significantly weakens previous structural assumptions on activation functions, thereby providing a more general and robust theoretical foundation for the convergence of DEQs.
📝 Abstract
In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. This paper shows that this fact still holds for DEQs with any general activation that has bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we need to create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.
Problem

Research questions and friction points this paper is trying to address.

Global convergence rate of DEQs
General activation functions
Bounding Gram matrix eigenvalues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Equilibrium Models
Hermite polynomial expansion
population Gram matrix
🔎 Similar Papers
No similar papers found.
L
Lan V. Truong
School of Mathematics, Statistics and Actuarial Science, University of Essex