Target-Aware Data Augmentation for SAT Prediction

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This work addresses the high data generation cost and poor scalability of learning-based SAT solvers, which typically rely on expensive solver annotations. To overcome this limitation, the authors propose a solver-free, target-aware data generation framework that directly constructs labeled SAT/UNSAT instances aligned with the structural properties of a given benchmark. They further introduce a Linear Programming-aware Graph Neural Network (LPGNN) that incorporates constraint violation residuals into its message-passing mechanism. The proposed approach achieves orders-of-magnitude speedup in data generation and substantially outperforms existing GNN models on SAT satisfiability prediction, enabling efficient and scalable synthetic data-driven learning for SAT solving.

📝 Abstract

Learning-based approaches to NP-hard problems have shown increasing promise, but their progress is fundamentally constrained by the high cost of generating labeled training data. In domains such as Boolean satisfiability (SAT), standard pipelines rely on solver-in-the-loop labeling, which scales poorly with problem size and limits the amount of usable supervision. This bottleneck hinders the broader goal of leveraging machine learning to capture structure in hard combinatorial problems. In this work, we propose a target-aware, solver-free data generation framework for SAT that produces correctly labeled SAT and UNSAT instances by construction, eliminating the need for expensive solver calls. Our method aligns generated instances with the structural properties of a target benchmark, making synthetic data effective for downstream learning. We further develop a linear-programming-aware graph neural network (LPGNN) architecture that incorporates constraint-violation residuals into message passing, enabling the model to exploit underlying optimization structure. Together, these contributions support a data-centric paradigm for learning on NP-hard problems, where scalable, task-aligned data generation is as critical as model design. Our approach yields orders-of-magnitude speedups in data generation, demonstrating that benchmark-aligned synthetic data can effectively augment solver-labeled datasets for GNN-based SAT prediction.

Problem

Research questions and friction points this paper is trying to address.

SAT prediction

data augmentation

NP-hard problems

labeled data generation

Boolean satisfiability

Innovation

Methods, ideas, or system contributions that make the work stand out.

target-aware data augmentation

solver-free data generation

SAT prediction