π€ AI Summary
This work addresses the poor out-of-distribution generalization of large language models (LLMs) on formal deductive reasoning tasks. We propose MIND, the first meta-learning-based in-context deduction framework, which integrates few-shot meta-instruction tuning, structured reasoning trace construction, dynamic context compression, and premise selection modeling to enable models to identify the minimal subset of premises required to derive a given hypothesis from a knowledge base. MIND achieves systematic generalization to unseen knowledge bases and inference rulesβa capability previously unattained. Evaluated on 1.5Bβ7B parameter models, it significantly outperforms state-of-the-art larger models including GPT-4o and o3-mini, particularly under low-resource settings. Our approach bridges a critical gap between scalable instruction tuning and rigorous formal reasoning, establishing a new baseline for efficient, generalizable in-context deduction.
π Abstract
Large language models (LLMs) are increasingly evaluated on formal tasks, where strong reasoning abilities define the state of the art. However, their ability to generalize to out-of-distribution problems remains limited. In this paper, we investigate how LLMs can achieve a systematic understanding of deductive rules. Our focus is on the task of identifying the appropriate subset of premises within a knowledge base needed to derive a given hypothesis. To tackle this challenge, we propose Meta-learning for In-context Deduction (MIND), a novel few-shot meta-learning fine-tuning approach. The goal of MIND is to enable models to generalize more effectively to unseen knowledge bases and to systematically apply inference rules. Our results show that MIND significantly improves generalization in small LMs ranging from 1.5B to 7B parameters. The benefits are especially pronounced in smaller models and low-data settings. Remarkably, small models fine-tuned with MIND outperform state-of-the-art LLMs, such as GPT-4o and o3-mini, on this task.