Learning with Structure: Computing Consistent Subsets on Structurally-Regular Graphs

📅 2025-12-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the Minimum Consistent Subset (MCS) problem in graph metric spaces: given a labeled graph, find the smallest label subset such that every vertex has a nearest neighbor of the same label within that subset. MCS underpins supervised clustering and efficient instance selection but is computationally hard. We establish, for the first time, that MCS is fixed-parameter tractable (FPT) with respect to two structural graph parameters—vertex cover number (vc) and neighborhood diversity (nd)—surpassing prior results restricted to trees or other highly constrained graph classes. Our algorithms run in time $vc^{O(vc)} cdot ext{poly}(n,c)$ and $nd^{O(nd)} cdot ext{poly}(n,c)$, respectively, where $n$ is the number of vertices and $c$ the number of labels, maintaining polynomial dependence on $c$. The approach integrates structural graph analysis, dynamic programming, and enumeration pruning. These advances significantly enhance the feasibility and efficiency of computing MCS on large-scale, multi-labeled graph data.

Technology Category

Application Category

📝 Abstract
The Minimum Consistent Subset (MCS) problem arises naturally in the context of supervised clustering and instance selection. In supervised clustering, one aims to infer a meaningful partitioning of data using a small labeled subset. However, the sheer volume of training data in modern applications poses a significant computational challenge. The MCS problem formalizes this goal: given a labeled dataset $mathcal{X}$ in a metric space, the task is to compute a smallest subset $S subseteq mathcal{X}$ such that every point in $mathcal{X}$ shares its label with at least one of its nearest neighbors in $S$. Recently, the MCS problem has been extended to graph metrics, where distances are defined by shortest paths. Prior work has shown that MCS remains NP-hard even on simple graph classes like trees, though an algorithm with runtime $mathcal{O}(2^{6c} cdot n^6)$ is known for trees, where $c$ is the number of colors and $n$ the number of vertices. This raises the challenge of identifying graph classes that admit algorithms efficient in both $n$ and $c$. In this work, we study the Minimum Consistent Subset problem on graphs, focusing on two well-established measures: the vertex cover number ($vc$) and the neighborhood diversity ($nd$). We develop an algorithm with running time $vc^{mathcal{O}(vc)}cdot ext{Poly}(n,c)$, and another algorithm with runtime $nd^{mathcal{O}(nd)}cdot ext{Poly}(n,c)$. In the language of parameterized complexity, this implies that MCS is fixed-parameter tractable (FPT) parameterized by the vertex cover number and the neighborhood diversity. Notably, our algorithms remain efficient for arbitrarily many colors, as their complexity is polynomially dependent on the number of colors.
Problem

Research questions and friction points this paper is trying to address.

Computes smallest labeled subset for supervised clustering
Extends Minimum Consistent Subset problem to graph metrics
Develops efficient algorithms using vertex cover and neighborhood diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameterized algorithms using vertex cover number
Fixed-parameter tractable approach with neighborhood diversity
Polynomial dependence on colors for graph MCS
🔎 Similar Papers
No similar papers found.
Aritra Banik
Aritra Banik
Assistant professor, NISER
Algorithms
M
Mano Prakash Parthasarathi
North Carolina State University, Raleigh, NC, USA.
Venkatesh Raman
Venkatesh Raman
(Retd) The Institute of Mathematical Sciences, HBNI, Chennai, India.
D
Diya Roy
National Institute of Science, Education and Research, An OCC of Homi Bhabha National Institute, Bhubaneswar, India.
Abhishek Sahu
Abhishek Sahu
Visiting Faculty, Niser
Algorithms