🤖 AI Summary
This study investigates how programming language selection and syntactic features during pretraining influence model logical reasoning capabilities. Method: We train decoder-only models from scratch on datasets comprising ten programming languages and three natural languages, using a unified architecture, and evaluate few-shot performance on logic-only benchmarks (FLD and bAbI). Contribution/Results: We provide the first empirical evidence of a causal improvement in logical reasoning from programming language pretraining. Key findings are: (1) Structured syntactic features—particularly abstract syntax tree (AST) depth—exhibit significant positive correlation with reasoning performance; (2) Programming language–pretrained models consistently outperform natural language–only baselines on pure logical reasoning tasks; (3) Instruction-following ability improves concurrently, suggesting that structured priors acquired through programming language pretraining induce an implicit, generalizable reasoning mechanism. These results highlight the value of formal syntax as an inductive bias for enhancing foundational reasoning capacities in language models.
📝 Abstract
Recent large language models (LLMs) have demonstrated remarkable generalization abilities in mathematics and logical reasoning tasks.Prior research indicates that LLMs pre-trained with programming language data exhibit high mathematical and reasoning abilities; however, this causal relationship has not been rigorously tested. Our research aims to verify which programming languages and features during pre-training affect logical inference performance. Specifically, we pre-trained decoder-based language models from scratch using datasets from ten programming languages (e.g., Python, C, Java) and three natural language datasets (Wikipedia, Fineweb, C4) under identical conditions. Thereafter, we evaluated the trained models in a few-shot in-context learning setting on logical reasoning tasks: FLD and bAbi, which do not require commonsense or world knowledge. The results demonstrate that nearly all models trained with programming languages consistently outperform those trained with natural languages, indicating that programming languages contain factors that elicit logic inference performance. In addition, we found that models trained with programming languages exhibit a better ability to follow instructions compared to those trained with natural languages. Further analysis reveals that the depth of Abstract Syntax Trees representing parsed results of programs also affects logical reasoning performance. These findings will offer insights into the essential elements of pre-training for acquiring the foundational abilities of LLMs.