Finding Rule-Interpretable Non-Negative Data Representation

📅 2022-06-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

To address the poor interpretability of Non-negative Matrix Factorization (NMF) in biomedical applications and the limited low-dimensional representation capability of conventional rule-based methods, this paper proposes RuleNMF—a novel non-negative data representation framework that embeds symbolic rules into the NMF architecture. Its core innovation is the first realization of rule-regularized latent factor mapping: numerical latent variables are explicitly encoded as high-coverage, semantically transparent interval- or category-based rule subsets, preserving part-based structure while enabling precise semantic interpretation. By integrating rule-guided constrained optimization and rule–feature alignment modeling, RuleNMF significantly enhances interpretability and downstream performance in multi-label supervised NMF and focused embedding tasks. Moreover, it supports quantitative attribution analysis—e.g., attribute importance scoring—and cross-factor relational reasoning.

📝 Abstract

Non-negative Matrix Factorization (NMF) is an intensively used technique for obtaining parts-based, lower dimensional and non-negative representation of non-negative data. It is a popular method in different research fields. Scientists performing research in the fields of biology, medicine and pharmacy often prefer NMF over other dimensionality reduction approaches (such as PCA) because the non-negativity of the approach naturally fits the characteristics of the domain problem and its result is easier to analyze and understand. Despite these advantages, it still can be hard to get exact characterization and interpretation of the NMF's resulting latent factors due to their numerical nature. On the other hand, rule-based approaches are often considered more interpretable but lack the parts-based interpretation. In this work, we present a version of the NMF approach that merges rule-based descriptions with advantages of part-based representation offered by the NMF approach. Given the numerical input data with non-negative entries and a set of rules with high entity coverage, the approach creates the lower-dimensional non-negative representation of the input data in such a way that its factors are described by the appropriate subset of the input rules. In addition to revealing important attributes for latent factors, it allows analyzing relations between these attributes and provides the exact numerical intervals or categorical values they take. The proposed approach provides numerous advantages in tasks such as focused embedding or performing supervised multi-label NMF.

Problem

Research questions and friction points this paper is trying to address.

Non-negative Matrix Factorization

Data Simplification

Interpretability Enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rule-enhanced NMF

Feature Interaction

Complex Data Analysis

🔎 Similar Papers

A Unified Approach to Extract Interpretable Rules from Tree Ensembles via Integer Programming