SAT-sampling for statistical significance testing in sparse contingency tables

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Exact conditional inference for sparse contingency tables—especially those with structural zeros—is hindered by conventional Markov chain Monte Carlo (MCMC) methods based on Markov bases, which often fail due to intractable basis computation and slow convergence. Method: We propose SAT-Driven MCMC: a framework that encodes the fixed-margin fiber as a Boolean satisfiability (SAT) problem and leverages modern SAT samplers to generate global proposals; these are combined with local moves to form a hybrid proposal mechanism. Crucially, this approach eliminates the need for precomputing a Markov basis or assuming graph connectivity, and natively accommodates structural zeros. We further introduce sampling bias diagnostics and correction to ensure convergence to the correct stationary distribution. Results: Experiments on multiple high-dimensional sparse benchmarks demonstrate that our method consistently yields reliable p-values, outperforming existing basis-dependent approaches in both stability and accuracy.

Technology Category

Application Category

📝 Abstract
Exact conditional tests for contingency tables require sampling from fibers with fixed margins. Classical Markov basis MCMC is general but often impractical: computing full Markov bases that connect all fibers of a given constraint matrix can be infeasible and the resulting chains may converge slowly, especially in sparse settings or in presence of structural zeros. We introduce a SAT-based alternative that encodes fibers as Boolean circuits which allows modern SAT samplers to generate tables randomly. We analyze the sampling bias that SAT samplers may introduce, provide diagnostics, and propose practical mitigation. We propose hybrid MCMC schemes that combine SAT proposals with local moves to ensure correct stationary distributions which do not necessarily require connectivity via local moves which is particularly beneficial in presence of structural zeros. Across benchmarks, including small and involved tables with many structural zeros where pure Markov-basis methods underperform, our methods deliver reliable conditional p-values and often outperform samplers that rely on precomputed Markov bases.
Problem

Research questions and friction points this paper is trying to address.

Exact conditional tests require fiber sampling with fixed margins
Markov basis MCMC faces computational infeasibility and slow convergence
SAT-based sampling addresses sparse tables with structural zeros
Innovation

Methods, ideas, or system contributions that make the work stand out.

SAT-based sampling for contingency table fibers
Hybrid MCMC combining SAT proposals with local moves
Bias diagnostics and mitigation for SAT samplers
🔎 Similar Papers
No similar papers found.