🤖 AI Summary
To address the dual challenges of natural language ambiguity and limited expressiveness of formal symbolic languages in logical and mathematical reasoning with large language models (LLMs), this paper proposes an adaptive meta-selection mechanism that dynamically selects between natural language inference (NLI) and first-order logic (FOL)-based symbolic reasoning for each problem. The core innovation is the design of the first learnable meta-selector for reasoning paradigms, supporting both prompt-based and fine-tuned deployment. We empirically demonstrate—for the first time—that a lightweight model (e.g., GPT-3.5-turbo), when optimized via prompting, outperforms GPT-4o on the FOLIO subset. Using LLaMA-3.1-8B-Instruct, we fine-tune the meta-classifier and integrate multi-stage prompt engineering with hybrid reasoning-chain construction. Our approach achieves absolute improvements of +4.4% (fine-tuning) and +10% (prompting) over GPT-4o on FOLIO, and +1.3% on MATH. Code and data are publicly released.
📝 Abstract
LLMs approach logical and mathematical reasoning through natural or symbolic languages. While natural language offers human-accessible flexibility but suffers from ambiguity, symbolic reasoning provides precise, machine-executable inferences at the cost of strict domain constraints. We introduce HYBRIDMIND, an adaptive strategy that selects the optimal reasoning approach for each reasoning problem. Through extensive experiments, we evaluate both prompting-based approaches with state-of-the-art LLMs and fine-tuned open-source models. We find that fine-tuning LLaMA-3.1-8B-Instruct as a meta-selector outperforms GPT-4o's natural language reasoning by 4.4% on FOLIO and 1.3% on MATH. More notably, using GPT-3.5-turbo as a prompted meta-selector yields a 10% improvement on FOLIO's challenging subset compared to GPT-4o. We will release our code and data to support future research.