Using Constraints to Discover Sparse and Alternative Subgroup Descriptions

📅 2024-06-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
This work addresses the limited interpretability in subgroup discovery by introducing dual constraints: sparsity—restricting the number of features in subgroup descriptions to enhance conciseness—and substitutability—requiring the generation of alternative subgroups that cover similar instances yet exhibit distinct feature compositions. We formally define the “substitutable subgroup discovery” optimization problem for the first time. To solve it, we propose an SMT-based white-box framework supporting general logical constraints; we prove that both sparsity and substitutability constraints render the problem NP-hard. We further design an efficient heuristic algorithm and evaluate it on 27 binary classification datasets. Our method generates high-quality, sparse, and semantically diverse subgroups within milliseconds, significantly improving model transparency and enriching user insights through multiple complementary subgroup perspectives.

Technology Category

Application Category

📝 Abstract
Subgroup-discovery methods allow users to obtain simple descriptions of interesting regions in a dataset. Using constraints in subgroup discovery can enhance interpretability even further. In this article, we focus on two types of constraints: First, we limit the number of features used in subgroup descriptions, making the latter sparse. Second, we propose the novel optimization problem of finding alternative subgroup descriptions, which cover a similar set of data objects as a given subgroup but use different features. We describe how to integrate both constraint types into heuristic subgroup-discovery methods. Further, we propose a novel Satisfiability Modulo Theories (SMT) formulation of subgroup discovery as a white-box optimization problem, which allows solver-based search for subgroups and is open to a variety of constraint types. Additionally, we prove that both constraint types lead to an NP-hard optimization problem. Finally, we employ 27 binary-classification datasets to compare algorithmic and solver-based search for unconstrained and constrained subgroup discovery. We observe that heuristic search methods often yield high-quality subgroups within a short runtime, also in scenarios with constraints.
Problem

Research questions and friction points this paper is trying to address.

Enhance interpretability using constraints in subgroup discovery
Find alternative subgroup descriptions using different features
Integrate constraints into heuristic subgroup-discovery methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse subgroup descriptions
Alternative subgroup optimization
SMT-based white-box optimization
🔎 Similar Papers
No similar papers found.