Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?

📅 2024-10-09

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 3

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This study investigates how programming language selection and syntactic features during pretraining influence model logical reasoning capabilities. Method: We train decoder-only models from scratch on datasets comprising ten programming languages and three natural languages, using a unified architecture, and evaluate few-shot performance on logic-only benchmarks (FLD and bAbI). Contribution/Results: We provide the first empirical evidence of a causal improvement in logical reasoning from programming language pretraining. Key findings are: (1) Structured syntactic features—particularly abstract syntax tree (AST) depth—exhibit significant positive correlation with reasoning performance; (2) Programming language–pretrained models consistently outperform natural language–only baselines on pure logical reasoning tasks; (3) Instruction-following ability improves concurrently, suggesting that structured priors acquired through programming language pretraining induce an implicit, generalizable reasoning mechanism. These results highlight the value of formal syntax as an inductive bias for enhancing foundational reasoning capacities in language models.

Technology Category

Application Category

📝 Abstract

Recent large language models (LLMs) have demonstrated remarkable generalization abilities in mathematics and logical reasoning tasks.Prior research indicates that LLMs pre-trained with programming language data exhibit high mathematical and reasoning abilities; however, this causal relationship has not been rigorously tested. Our research aims to verify which programming languages and features during pre-training affect logical inference performance. Specifically, we pre-trained decoder-based language models from scratch using datasets from ten programming languages (e.g., Python, C, Java) and three natural language datasets (Wikipedia, Fineweb, C4) under identical conditions. Thereafter, we evaluated the trained models in a few-shot in-context learning setting on logical reasoning tasks: FLD and bAbi, which do not require commonsense or world knowledge. The results demonstrate that nearly all models trained with programming languages consistently outperform those trained with natural languages, indicating that programming languages contain factors that elicit logic inference performance. In addition, we found that models trained with programming languages exhibit a better ability to follow instructions compared to those trained with natural languages. Further analysis reveals that the depth of Abstract Syntax Trees representing parsed results of programs also affects logical reasoning performance. These findings will offer insights into the essential elements of pre-training for acquiring the foundational abilities of LLMs.

Problem

Research questions and friction points this paper is trying to address.

Identify programming languages boosting logical inference in LLMs

Assess impact of pre-training features on reasoning performance

Compare programming vs natural language data for logic tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained models with programming languages outperform natural languages

Abstract Syntax Tree depth impacts logical reasoning performance

Programming language features enhance instruction-following ability

🔎 Similar Papers

No similar papers found.