Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of conventional non-negative matrix factorization (NMF) methods—based on least squares or L1-norm loss—when applied to sparse non-negative data contaminated by heavy-tailed noise, outliers, or spurious zeros. To enhance robustness and promote factor sparsity, the authors propose a weighted L1-NMF model that employs component-wise L1 loss with adjustable weights. They develop a sparse coordinate descent (sCD) algorithm that efficiently solves subproblems via weighted medians, achieving computational complexity proportional only to the number of non-zero data entries. The study theoretically establishes that L1-NMF is NP-hard even in the rank-one case and reveals its inherent tendency to induce sparse factors when given sparse inputs. Experiments on both synthetic and real-world sparse datasets demonstrate that the proposed method significantly outperforms existing approaches, particularly in the presence of noise and outliers.
📝 Abstract
Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).
Problem

Research questions and friction points this paper is trying to address.

Nonnegative Matrix Factorization
L1 norm
Sparse Data
Outliers
Heavy-tailed Noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

L1-NMF
weighted L1-NMF
sparsity
coordinate descent
weighted median
🔎 Similar Papers
No similar papers found.
G
Giovanni Seraghiti
University of Mons, Rue de Houdain 9, 7000 Mons, Belgium, and Dipartimento di Ingegneria Industriale, Università degli Studi di Firenze, Viale Morgagni 40/44, 50134 Firenze, Italia
K
Kévin Dubrulle
University of Mons, Rue de Houdain 9, 7000 Mons, Belgium
Arnaud Vandaele
Arnaud Vandaele
Mathematics and Operational Research, Faculté Polytechnique, Université de Mons, Belgium
Operational ResearchOptimizationDiscrete MathematicsNumerical AnalysisNumerical Linear Algebra
Nicolas Gillis
Nicolas Gillis
University of Mons
optimizationdata sciencenumerical linear algebrasignal processing