Hashing for Fast Pattern Set Selection

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

266K/year

🤖 AI Summary

This paper addresses the efficient pattern set selection problem under approximate pattern matching. Conventional approaches incur prohibitive computational overhead when supporting approximation, failing to balance efficiency and reconstruction fidelity. We propose a fast greedy algorithm based on bottom-k hashing—the first adaptation of bottom-k hashing to approximate pattern settings. Our method integrates a reconstruction error evaluation framework with an adaptive approximate matching mechanism, achieving near-optimal solution quality while substantially accelerating search. Experiments on synthetic and real-world datasets demonstrate that our approach accelerates standard greedy selection by one to two orders of magnitude, with only a marginal increase in reconstruction error (1.2%–4.7%). The method is applicable to tasks such as database blocking and Boolean matrix factorization.

Technology Category

Application Category

📝 Abstract

Pattern set mining, which is the task of finding a good set of patterns instead of all patterns, is a fundamental problem in data mining. Many different definitions of what constitutes a good set have been proposed in recent years. In this paper, we consider the reconstruction error as a proxy measure for the goodness of the set, and concentrate on the adjacent problem of how to find a good set efficiently. We propose a method based on bottom-k hashing for efficiently selecting the set and extend the method for the common case where the patterns might only appear in approximate form in the data. Our approach has applications in tiling databases, Boolean matrix factorization, and redescription mining, among others. We show that our hashing-based approach is significantly faster than the standard greedy algorithm while obtaining almost equally good results in both synthetic and real-world data sets.

Problem

Research questions and friction points this paper is trying to address.

Efficiently selecting a good pattern set with low reconstruction error

Handling approximate pattern appearances in data efficiently

Improving speed over standard greedy algorithms in pattern mining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses bottom-k hashing for efficient selection

Extends method for approximate pattern matching

Faster than standard greedy algorithm

🔎 Similar Papers

Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition