GIST: Greedy Independent Set Thresholding for Diverse Data Summarization

πŸ“… 2024-05-29
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the subset selection problem in metric spaces that jointly optimizes utility and diversity, formalized as the Minimum-Distance Maximization with Monotone Submodular utility (MDMS) problem under a cardinality constraint. As MDMS is NP-hard, we propose GISTβ€”the first algorithm achieving both theoretical guarantees and practical efficacy. GIST integrates greedy independent set construction with thresholding and employs a bi-criteria approximation scheme, attaining a 1/2 approximation ratio; we further prove that 0.5584 is a tight inapproximability bound. Unlike existing methods that optimize either utility or diversity alone, GIST demonstrates significant performance gains on ImageNet for one-shot subset selection in image classification, empirically validating the effectiveness of co-modeling utility and diversity.

Technology Category

Application Category

πŸ“ Abstract
We introduce a novel subset selection problem called min-distance diversification with monotone submodular utility ($ extsf{MDMS}$), which has a wide variety of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal of $ extsf{MDMS}$ is to maximize an objective function combining a monotone submodular utility term and a min-distance diversity term between any pair of selected points, subject to a cardinality constraint. We propose the $ exttt{GIST}$ algorithm, which achieves a $frac{1}{2}$-approximation guarantee for $ extsf{MDMS}$ by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove that it is NP-hard to approximate to within a factor of $0.5584$. Finally, we demonstrate that $ exttt{GIST}$ outperforms existing benchmarks for on a real-world image classification task that studies single-shot subset selection for ImageNet.
Problem

Research questions and friction points this paper is trying to address.

Min-distance diversification with submodular utility
Maximize submodular utility and diversity
Approximate maximum independent set problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Greedy Independent Set Thresholding
Min-distance diversification problem
Bicriteria greedy algorithm
πŸ”Ž Similar Papers
No similar papers found.