Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the limitations of conventional non-negative matrix factorization (NMF) methods—based on least squares or L1-norm loss—when applied to sparse non-negative data contaminated by heavy-tailed noise, outliers, or spurious zeros. To enhance robustness and promote factor sparsity, the authors propose a weighted L1-NMF model that employs component-wise L1 loss with adjustable weights. They develop a sparse coordinate descent (sCD) algorithm that efficiently solves subproblems via weighted medians, achieving computational complexity proportional only to the number of non-zero data entries. The study theoretically establishes that L1-NMF is NP-hard even in the rank-one case and reveals its inherent tendency to induce sparse factors when given sparse inputs. Experiments on both synthetic and real-world sparse datasets demonstrate that the proposed method significantly outperforms existing approaches, particularly in the presence of noise and outliers.

Technology Category

Application Category

📝 Abstract

Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).

Problem

Research questions and friction points this paper is trying to address.

Nonnegative Matrix Factorization

L1 norm

Sparse Data

Outliers

Heavy-tailed Noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

L1-NMF

weighted L1-NMF

sparsity

coordinate descent

weighted median

🔎 Similar Papers

A fast Multiplicative Updates algorithm for Non-negative Matrix Factorization

2023-03-31arXiv.orgCitations: 2

GSVD-NMF: Recovering Missing Features in Non-negative Matrix Factorization

2024-08-15arXiv.orgCitations: 0

💼 Related Jobs

Machine Learning Engineer