Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

📅 2024-11-19
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the generalization mechanisms underlying large language models’ (LLMs) mathematical reasoning capabilities, challenging the assumption that such reasoning relies on memorized factual answers or simple retrieval. Method: Leveraging influence function analysis, we identify high-impact pretraining documents associated with mathematical reasoning; these are then examined via qualitative content analysis and validated through multi-task ablation experiments on 7B- and 35B-parameter models. Contribution/Results: We provide the first empirical evidence that distinct reasoning problems within the same task share critical procedural knowledge—e.g., solution steps, formula derivations, and code logic—in pretraining documents. Crucially, these documents rarely contain final answers or intermediate results, yet enable robust cross-task procedural synthesis. Our findings establish that LLMs’ mathematical reasoning stems from generalization and recomposition of procedural knowledge, fundamentally differing from fact-based retrieval paradigms. This reframes reasoning as a compositional, knowledge-structured process and offers novel insights into the origins and robustness of model reasoning.

Technology Category

Application Category

📝 Abstract
The capabilities and limitations of Large Language Models have been sketched out in great detail in recent years, providing an intriguing yet conflicting picture. On the one hand, LLMs demonstrate a general ability to solve problems. On the other hand, they show surprising reasoning gaps when compared to humans, casting doubt on the robustness of their generalisation strategies. The sheer volume of data used in the design of LLMs has precluded us from applying the method traditionally used to measure generalisation: train-test set separation. To overcome this, we study what kind of generalisation strategies LLMs employ when performing reasoning tasks by investigating the pretraining data they rely on. For two models of different sizes (7B and 35B) and 2.5B of their pretraining tokens, we identify what documents influence the model outputs for three simple mathematical reasoning tasks and contrast this to the data that are influential for answering factual questions. We find that, while the models rely on mostly distinct sets of data for each factual question, a document often has a similar influence across different reasoning questions within the same task, indicating the presence of procedural knowledge. We further find that the answers to factual questions often show up in the most influential data. However, for reasoning questions the answers usually do not show up as highly influential, nor do the answers to the intermediate reasoning steps. When we characterise the top ranked documents for the reasoning questions qualitatively, we confirm that the influential documents often contain procedural knowledge, like demonstrating how to obtain a solution using formulae or code. Our findings indicate that the approach to reasoning the models use is unlike retrieval, and more like a generalisable strategy that synthesises procedural knowledge from documents doing a similar form of reasoning.
Problem

Research questions and friction points this paper is trying to address.

Investigates generalization strategies in Large Language Models (LLMs).
Explores procedural knowledge in pretraining data for reasoning tasks.
Compares data influence on factual vs. reasoning questions in LLMs.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes pretraining data influence on reasoning tasks
Identifies procedural knowledge in influential documents
Contrasts reasoning strategies with factual question retrieval
🔎 Similar Papers
No similar papers found.