NSPG-Miner: Mining Repetitive Negative Sequential Patterns

📅 2025-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional gap-constrained sequential pattern mining (SPM) fails to identify stably absent items within frequent positive sequential patterns (PSPGs). To address this, we formally introduce the task of negative sequential pattern mining (NSPGs): discovering item combinations consistently absent across all occurrences of frequent positive patterns satisfying gap constraints. We propose the first joint mining algorithm for positive and negative sequential patterns, featuring a novel negative-pattern-guided pattern joining strategy and a NegPair support measure that eliminates redundant sequence rescanning. Our method further incorporates key optimizations—including key-value array indexing, explicit gap constraint modeling, and efficient negative pattern evaluation—to ensure scalability and semantic richness. Extensive experiments on 11 real-world datasets demonstrate that our approach significantly outperforms 11 state-of-the-art algorithms in both efficiency and pattern utility. The source code and datasets are publicly available.

Technology Category

Application Category

📝 Abstract
Sequential pattern mining (SPM) with gap constraints (or repetitive SPM or tandem repeat discovery in bioinformatics) can find frequent repetitive subsequences satisfying gap constraints, which are called positive sequential patterns with gap constraints (PSPGs). However, classical SPM with gap constraints cannot find the frequent missing items in the PSPGs. To tackle this issue, this paper explores negative sequential patterns with gap constraints (NSPGs). We propose an efficient NSPG-Miner algorithm that can mine both frequent PSPGs and NSPGs simultaneously. To effectively reduce candidate patterns, we propose a pattern join strategy with negative patterns which can generate both positive and negative candidate patterns at the same time. To calculate the support (frequency of occurrence) of a pattern in each sequence, we explore a NegPair algorithm that employs a key-value pair array structure to deal with the gap constraints and the negative items simultaneously and can avoid redundant rescanning of the original sequence, thus improving the efficiency of the algorithm. To report the performance of NSPG-Miner, 11 competitive algorithms and 11 datasets are employed. The experimental results not only validate the effectiveness of the strategies adopted by NSPG-Miner, but also verify that NSPG-Miner can discover more valuable information than the state-of-the-art algorithms. Algorithms and datasets can be downloaded from https://github.com/wuc567/Pattern-Mining/tree/master/NSPG-Miner.
Problem

Research questions and friction points this paper is trying to address.

Mining frequent missing items in sequences
Discovering negative sequential patterns efficiently
Reducing redundant scanning in sequence analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mines NSPGs and PSPGs simultaneously
Utilizes pattern join strategy
Employs NegPair for efficient support calculation
🔎 Similar Papers
No similar papers found.
Y
Yan Li
School of Economics and Management, Hebei University of Technology, China
Z
Zhulin Wang
School of Artificial Intelligence, Hebei University of Technology, China
J
Jing Liu
School of Artificial Intelligence, Hebei University of Technology, China
L
Lei Guo
State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology, China
Philippe Fournier-Viger
Philippe Fournier-Viger
Distinguished professor, Shenzhen University, China
Data MiningArtificial IntelligenceBig DataPattern MiningComplex data
Youxi Wu
Youxi Wu
Hebei University of Technology
Data mining and machine learning
X
Xindong Wu
the Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China), Hefei University of Technology, China