🤖 AI Summary
This work addresses the challenge that large language models (LLMs) often fail to guarantee hard constraint satisfaction in combinatorial optimization, thereby limiting their practical applicability. To this end, we propose FALCON, a novel framework that integrates grammar-constrained decoding, a feasibility repair layer, and adaptive Best-of-N sampling into an end-to-end mechanism ensuring solution feasibility. Furthermore, we introduce Best-anchored Objective-guided Preference Optimization (BOPO), a training objective that requires no human annotations and is theoretically proven to converge. Evaluated across seven NP-hard combinatorial optimization problems, FALCON achieves 100% feasibility in generated solutions while matching or surpassing the solution quality of state-of-the-art neural and LLM-based solvers.
📝 Abstract
Large language models (LLMs) have emerged as promising general-purpose solvers for combinatorial optimization (CO), yet they fundamentally lack mechanisms to guarantee solution feasibility which is critical for real-world deployment. In this work, we introduce FALCON, a framework that ensures 100\% feasibility through three key innovations: (i) \emph{grammar-constrained decoding} enforces syntactic validity, (ii) a \emph{feasibility repair layer} corrects semantic constraint violations, and (iii) \emph{adaptive Best-of-$N$ sampling} allocates inference compute efficiently. To train the underlying LLM, we introduce the Best-anchored Objective-guided Preference Optimization (BOPO) in LLM training, which weights preference pairs by their objective gap, providing dense supervision without human labels. Theoretically, we prove convergence for BOPO and provide bounds on repair-induced quality loss. Empirically, across seven NP-hard CO problems, FALCON achieves perfect feasibility while matching or exceeding the solution quality of state-of-the-art neural and LLM-based solvers.