Near-Linear Time Projection onto the $ell_{1,infty}$ Ball; Application to Sparse Neural Networks

📅 2023-07-19

🏛️ IEEE International Conference on Tools with Artificial Intelligence

📈 Citations: 3

✨ Influential: 0

career value

276K/year

🤖 AI Summary

To address the inefficiency of feature selection and weight sparsification in large-scale neural network training, this paper proposes the first exact, finite-step convergent projection algorithm onto the $ell_{1,infty}$-norm ball, with time complexity $mathcal{O}(nm + Jlog(nm))$, where $J$ decreases significantly as sparsity increases—yielding near-linear empirical runtime. The method integrates divide-and-conquer sorting, adaptive threshold search, and an embedded projection mechanism operating during training. In ultra-sparse regimes—such as bioinformatics applications with fewer than 2% relevant features—it substantially accelerates autoencoder training while enabling high-accuracy feature selection. Experiments demonstrate that our algorithm outperforms existing methods in projection speed, exhibiting robustness and scalability across both general-purpose and highly sparse learning tasks.

📝 Abstract

Looking for sparsity is nowadays crucial to speed up the training of large-scale neural networks. Projections onto the $ell_{1}$ and $ell_{1,propto}$ are among the most efficient techniques to sparsify and reduce the overall cost of neural networks. In this paper, we introduce a new projection algorithm for the $ell_{1,infty}$ norm ball. Its worst-case time complexity is $mathcal{O}(nm+Jlog(nm))$ for a matrix in $mathbb{R}^{n imes m}. J$ is a term that tends to 0 when the sparsity is high, and to $n imes m$ in the worst case. The algorithm is easy to implement and it is guaranteed to converge to the exact solution in finite time. Moreover, we propose to incorporate the $ell_{1,infty}$ ball projection while training an auto encoder to enforce feature selection and sparsity of the weights. Sparsification appears in the encoder to primarily do feature selection due to our application in biology, where only a very small part ($<$ 2%) of the data is relevant. We show that in both the biological and general cases of sparsity, our method is the fastest.

Problem

Research questions and friction points this paper is trying to address.

Develops a fast projection algorithm for $ell_{1, infty}$ norm.

Enhances sparsity in autoencoders for efficient feature selection.

Reduces training time of neural networks via sparsification.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Near-linear time projection algorithm

Application to sparse autoencoders

Enforces feature selection and sparsity

🔎 Similar Papers

A new Linear Time Bi-level ℓ1,∞ projection ; Application to the sparsification of auto-encoders neural networks