🤖 AI Summary
This paper addresses the limited expressiveness of null models in binary transaction and sequence data mining, specifically their inability to adequately capture joint-degree structures. We propose a novel null model that preserves the bipartite graph’s joint-degree matrix—including “caterpillar” subgraphs (paths of length three)—by explicitly constraining joint-degree distributions in the null space for the first time. This enables more faithful retention of critical topological features from the original data. Methodologically, we design the Alice algorithm suite, a Markov Chain Monte Carlo (MCMC) framework featuring a customized state space and efficient neighborhood transition operators, ensuring rapid mixing and strong scalability. Experiments demonstrate that our model significantly enhances the discriminative power and statistical reliability of hypothesis testing. On multiple real-world datasets, it successfully identifies statistically significant patterns missed by conventional approaches, empirically validating its superior statistical performance.
📝 Abstract
We introduce novel null models for assessing the results obtained from observed binary transactional and sequence datasets, using statistical hypothesis testing. Our null models maintain more properties of the observed dataset than existing ones. Specifically, they preserve the Bipartite Joint Degree Matrix of the bipartite (multi-)graph corresponding to the dataset, which ensures that the number of caterpillars, i.e., paths of length three, is preserved, in addition to other properties considered by other models. We describe Alice , a suite of Markov chain Monte Carlo algorithms for sampling datasets from our null models, based on a carefully defined set of states and efficient operations to move between them. The results of our experimental evaluation show that Alice mixes fast and scales well, and that our null model finds different significant results than ones previously considered in the literature.