🤖 AI Summary
To address the inefficiency of feature selection and weight sparsification in large-scale neural network training, this paper proposes the first exact, finite-step convergent projection algorithm onto the $ell_{1,infty}$-norm ball, with time complexity $mathcal{O}(nm + Jlog(nm))$, where $J$ decreases significantly as sparsity increases—yielding near-linear empirical runtime. The method integrates divide-and-conquer sorting, adaptive threshold search, and an embedded projection mechanism operating during training. In ultra-sparse regimes—such as bioinformatics applications with fewer than 2% relevant features—it substantially accelerates autoencoder training while enabling high-accuracy feature selection. Experiments demonstrate that our algorithm outperforms existing methods in projection speed, exhibiting robustness and scalability across both general-purpose and highly sparse learning tasks.
📝 Abstract
Looking for sparsity is nowadays crucial to speed up the training of large-scale neural networks. Projections onto the $ell_{1}$ and $ell_{1,propto}$ are among the most efficient techniques to sparsify and reduce the overall cost of neural networks. In this paper, we introduce a new projection algorithm for the $ell_{1,infty}$ norm ball. Its worst-case time complexity is $mathcal{O}(nm+Jlog(nm))$ for a matrix in $mathbb{R}^{n imes m}. J$ is a term that tends to 0 when the sparsity is high, and to $n imes m$ in the worst case. The algorithm is easy to implement and it is guaranteed to converge to the exact solution in finite time. Moreover, we propose to incorporate the $ell_{1,infty}$ ball projection while training an auto encoder to enforce feature selection and sparsity of the weights. Sparsification appears in the encoder to primarily do feature selection due to our application in biology, where only a very small part ($<$ 2%) of the data is relevant. We show that in both the biological and general cases of sparsity, our method is the fastest.