🤖 AI Summary
Prior work lacks systematic evaluation of large language models’ (LLMs) capacity for rule learning through integrated abductive, deductive, and inductive reasoning in interactive settings.
Method: We introduce RULEARN—the first benchmark for interactive rule discovery—and propose IDEA, the first framework to formalize a closed-loop logical reasoning cycle: abduction for hypothesis generation, deduction for verification/solving, and induction for rule refinement. IDEA incorporates multi-step reasoning orchestration, dynamic environment simulation, hypothesis generation and validation mechanisms, and iterative rule update strategies.
Contribution/Results: On RULEARN, IDEA achieves an average 23.6% improvement over strong baselines across five state-of-the-art LLMs. Human behavioral comparison experiments further confirm its cognitive plausibility. This work establishes a novel paradigm, benchmark, and interpretable reasoning framework for human-like rule learning in LLMs.
📝 Abstract
While large language models (LLMs) have been thoroughly evaluated for deductive and inductive reasoning, their proficiency in holistic rule learning in interactive environments remains less explored. We introduce RULEARN, a novel benchmark to assess the rule-learning abilities of LLM agents in interactive settings. In RULEARN, agents strategically interact with simulated environments to gather observations, discern patterns, and solve complex problems. To enhance the rule-learning capabilities for LLM agents, we propose IDEA, a novel reasoning framework that integrates the process of Induction, Deduction, and Abduction. The IDEA agent generates initial hypotheses from limited observations through abduction, devises plans to validate these hypotheses or leverages them to solve problems via deduction, and refines previous hypotheses through induction, dynamically establishing and applying rules that mimic human rule-learning behaviors. Our evaluation of the IDEA framework, which involves five representative LLMs, demonstrates significant improvements over the baseline. Furthermore, our study with human participants reveals notable discrepancies in rule-learning behaviors between humans and LLMs. We believe our benchmark will serve as a valuable and challenging resource, and IDEA will provide crucial insights for the development of LLM agents capable of human-like rule learning in real-world scenarios. Our code and data is publicly available.