Algorithms for Boolean Matrix Factorization using Integer Programming and Heuristics

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses large-scale Boolean Matrix Factorization (BMF), aiming to enhance interpretability and approximation accuracy for binary data. The proposed method employs an alternating optimization framework: each single-factor subproblem is solved exactly via integer programming, integrated with an optimal rank-one factor selection strategy; greedy initialization and local search heuristics accelerate convergence; and a lightweight C++-implemented Boolean data structure optimizes memory usage and computational efficiency. The approach natively supports missing-value handling. Extensive experiments on multiple real-world datasets demonstrate that the algorithm significantly outperforms state-of-the-art BMF methods in both reconstruction accuracy and runtime efficiency—particularly exhibiting strong robustness and superior performance in scenarios with missing entries.

Technology Category

Application Category

📝 Abstract
Boolean matrix factorization (BMF) approximates a given binary input matrix as the product of two smaller binary factors. Unlike binary matrix factorization based on standard arithmetic, BMF employs the Boolean OR and AND operations for the matrix product, which improves interpretability and reduces the approximation error. It is also used in role mining and computer vision. In this paper, we first propose algorithms for BMF that perform alternating optimization (AO) of the factor matrices, where each subproblem is solved via integer programming (IP). We then design different approaches to further enhance AO-based algorithms by selecting an optimal subset of rank-one factors from multiple runs. To address the scalability limits of IP-based methods, we introduce new greedy and local-search heuristics. We also construct a new C++ data structure for Boolean vectors and matrices that is significantly faster than existing ones and is of independent interest, allowing our heuristics to scale to large datasets. We illustrate the performance of all our proposed methods and compare them with the state of the art on various real datasets, both with and without missing data, including applications in topic modeling and imaging.
Problem

Research questions and friction points this paper is trying to address.

Develop integer programming and heuristic algorithms for Boolean matrix factorization
Address scalability issues of IP methods with greedy and local-search heuristics
Create efficient C++ data structures to handle large Boolean datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Alternating optimization via integer programming
Greedy and local-search heuristics for scalability
Fast C++ data structure for Boolean operations
🔎 Similar Papers
No similar papers found.
C
Christos Kolomvakis
Department of Mathematics and Operational Research, Faculté Polytechnique, Université de Mons, Rue de Houdain 9, 7000 Mons, Belgium
T
Thomas Bobille
Department of Mathematics and Operational Research, Faculté Polytechnique, Université de Mons, Rue de Houdain 9, 7000 Mons, Belgium
A
A. Vandaele
Department of Mathematics and Operational Research, Faculté Polytechnique, Université de Mons, Rue de Houdain 9, 7000 Mons, Belgium
Nicolas Gillis
Nicolas Gillis
University of Mons
optimizationdata sciencenumerical linear algebrasignal processing