SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This paper addresses the NP-hard problem of certifying nonnegativity of multivariate polynomials. To overcome the limitations of existing large language models (LLMs) in rigorous symbolic reasoning, we propose SoS-7B, a lightweight model specialized for sum-of-squares (SoS) certification. Our method introduces (i) SoS-1K—the first expert-annotated benchmark for SoS verification; (ii) a hierarchical, structured reasoning instruction paradigm that aligns multi-stage symbolic deduction; and (iii) efficient fine-tuning requiring only four hours to achieve 81% accuracy. We demonstrate, for the first time, that O1/R1-style reasoning LLMs can serve as effective “SoS solvers.” Remarkably, SoS-7B outperforms both DeepSeek-V3 (671B) and GPT-4o-mini across all evaluation metrics, despite consuming only 1.8% and 5% of their computational overhead, respectively—significantly expanding the frontier of formal mathematical reasoning capabilities achievable by compact models.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have achieved human-level proficiency across diverse tasks, but their ability to perform rigorous mathematical problem solving remains an open challenge. In this work, we investigate a fundamental yet computationally intractable problem: determining whether a given multivariate polynomial is nonnegative. This problem, closely related to Hilbert's Seventeenth Problem, plays a crucial role in global polynomial optimization and has applications in various fields. First, we introduce SoS-1K, a meticulously curated dataset of approximately 1,000 polynomials, along with expert-designed reasoning instructions based on five progressively challenging criteria. Evaluating multiple state-of-the-art LLMs, we find that without structured guidance, all models perform only slightly above the random guess baseline 50%. However, high-quality reasoning instructions significantly improve accuracy, boosting performance up to 81%. Furthermore, our 7B model, SoS-7B, fine-tuned on SoS-1K for just 4 hours, outperforms the 671B DeepSeek-V3 and GPT-4o-mini in accuracy while only requiring 1.8% and 5% of the computation time needed for letters, respectively. Our findings highlight the potential of LLMs to push the boundaries of mathematical reasoning and tackle NP-hard problems.

Problem

Research questions and friction points this paper is trying to address.

Determining nonnegativity of multivariate polynomials

Improving LLMs' mathematical reasoning accuracy

Tackling NP-hard problems with efficient LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

SoS-1K dataset with expert-designed reasoning instructions

SoS-7B model fine-tuned for 4 hours

Outperforms larger models with less computation

🔎 Similar Papers

No similar papers found.