🤖 AI Summary
This paper systematically investigates the capability boundaries and core challenges of large language models (LLMs) in solving complex, multi-step problems—specifically identifying three critical bottlenecks: weak multi-step reasoning, insufficient domain-knowledge integration, and unverifiable outputs. To address these, the authors introduce the first cross-domain evaluation framework spanning software engineering, mathematical proof, data analysis, and scientific discovery, and propose a “knowledge–reasoning–verification” co-evolution paradigm. Their method integrates chain-of-thought prompting, external knowledge retrieval, tool invocation, and multi-level verification to jointly inject structured and unstructured knowledge. Empirical analysis clarifies current LLM performance limits on complex tasks, yields a scalable knowledge-augmented architecture, and identifies six concrete research directions toward trustworthy AI—thereby providing both theoretical foundations and practical pathways for high-reliability AI decision-making.
📝 Abstract
Problem-solving has been a fundamental driver of human progress in numerous domains. With advancements in artificial intelligence, Large Language Models (LLMs) have emerged as powerful tools capable of tackling complex problems across diverse domains. Unlike traditional computational systems, LLMs combine raw computational power with an approximation of human reasoning, allowing them to generate solutions, make inferences, and even leverage external computational tools. However, applying LLMs to real-world problem-solving presents significant challenges, including multi-step reasoning, domain knowledge integration, and result verification. This survey explores the capabilities and limitations of LLMs in complex problem-solving, examining techniques including Chain-of-Thought (CoT) reasoning, knowledge augmentation, and various LLM-based and tool-based verification techniques. Additionally, we highlight domain-specific challenges in various domains, such as software engineering, mathematical reasoning and proving, data analysis and modeling, and scientific research. The paper further discusses the fundamental limitations of the current LLM solutions and the future directions of LLM-based complex problems solving from the perspective of multi-step reasoning, domain knowledge integration and result verification.