๐ค AI Summary
A systematic survey and reproducible evaluation of large language modelsโ (LLMs) inductive reasoning capabilities remain lacking. Method: This work establishes the first comprehensive research framework for LLMsโ inductive reasoning, proposing a three-dimensional analytical paradigm encompassing method taxonomy, benchmark integration, and evaluation methodology; it introduces a unified assessment framework grounded in sandbox environments and observation coverage, overcoming limitations of conventional static evaluation; and it investigates the impact of architectural properties (e.g., layer depth, attention mechanisms) and training data distribution on inductive generalization. Contribution/Results: Empirical analysis reveals key structural and data-driven determinants of inductive capability; through post-training optimization, test-time scaling, and inductive-enhanced data construction, significant improvements in generalization are achieved on non-unique-answer tasks. The framework provides both theoretical foundations and practical pathways toward cognitive alignment and reliable reasoning in LLMs.
๐ Abstract
Reasoning is an important task for large language models (LLMs). Among all the reasoning paradigms, inductive reasoning is one of the fundamental types, which is characterized by its particular-to-general thinking process and the non-uniqueness of its answers. The inductive mode is crucial for knowledge generalization and aligns better with human cognition, so it is a fundamental mode of learning, hence attracting increasing interest. Despite the importance of inductive reasoning, there is no systematic summary of it. Therefore, this paper presents the first comprehensive survey of inductive reasoning for LLMs. First, methods for improving inductive reasoning are categorized into three main areas: post-training, test-time scaling, and data augmentation. Then, current benchmarks of inductive reasoning are summarized, and a unified sandbox-based evaluation approach with the observation coverage metric is derived. Finally, we offer some analyses regarding the source of inductive ability and how simple model architectures and data help with inductive tasks, providing a solid foundation for future research.