🤖 AI Summary
This work addresses the challenge that large language models (LLMs) often struggle to produce feasible and optimal solutions when tackling optimization tasks involving multiple constraints and user preferences. The authors propose a hybrid reasoning approach that leverages LLMs to translate natural language problem descriptions into Python code, which is then encoded as a preference-based MaxSAT formulation and solved by an exact solver. An independent semantic verification step ensures the correctness of the final results. This method uniquely integrates LLM-generated preference constraints with verifiable MaxSAT solving, supporting diverse encoding schemes and multiple optimal solutions. Experimental results demonstrate significant improvements over baseline strategies—including direct answering, chain-of-thought, and program-of-thought—across three types of preference reasoning tasks, with feasibility acceptance rates exceeding 80% in certain scenarios.
📝 Abstract
Large Language Models (LLMs) excel at understanding natural language but struggle with optimisation tasks involving multiple constraints and user-defined preferences, which commonly arise in domains such as robotics. We propose a hybrid reasoning approach in which LLMs externalise reasoning through code generation. Given a natural language problem description, an LLM generates Python code that encodes user-defined constraints and preferences as a preference-based Maximum Satisfiability (MaxSAT) problem, which is then solved by an exact MaxSAT solver. To ensure correctness, solutions returned by the model-generated code are independently verified for feasibility and optimality against a canonical MaxSAT encoding, allowing for different encodings and multiple optimal solutions. We evaluate our approach using both open-source and closed-access LLMs on three families of preference-based reasoning tasks, and compare it against direct-answer, chain-of-thought, and program-of-thought baselines using the same models. While these baselines rarely produce feasible solutions, the MaxSAT-based pipeline achieves substantially higher acceptance rates, in some cases exceeding 80%. Our results demonstrate that LLM-driven code generation combined with preference-based MaxSAT enables solver-verifiable optimisation with respect to generated encodings, and substantially improves correctness under independently verified reference semantics.