Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how large language models (LLMs) implicitly acquire reusable algorithmic abstractions solely through source-code training—and how such acquisition enhances general reasoning capabilities. Method: We propose *Program Backpropagation* (PBB), a novel training paradigm enabling LLMs to perform implicit program execution and forward reasoning without input-output examples, while supporting chain-of-thought (CoT)-style stepwise derivation. Unlike natural language supervision, PBB leverages raw code as the sole training signal to induce structured algorithmic representations. Contribution/Results: Experiments demonstrate that PBB significantly improves cross-input robustness and out-of-distribution generalization: models accurately execute algorithmic tasks on unseen inputs, confirming code’s efficacy—and superiority over natural language—as a medium for learning abstract, reusable algorithms. The approach advances our understanding of how structural inductive biases emerge from code-only pretraining and establishes a foundation for reasoning-aware algorithmic modeling.

Technology Category

Application Category

📝 Abstract
Training large language models (LLMs) on source code significantly enhances their general-purpose reasoning abilities, but the mechanisms underlying this generalisation are poorly understood. In this paper, we propose Programming by Backprop (PBB) as a potential driver of this effect - teaching a model to evaluate a program for inputs by training on its source code alone, without ever seeing I/O examples. To explore this idea, we finetune LLMs on two sets of programs representing simple maths problems and algorithms: one with source code and I/O examples (w/ IO), the other with source code only (w/o IO). We find evidence that LLMs have some ability to evaluate w/o IO programs for inputs in a range of experimental settings, and make several observations. Firstly, PBB works significantly better when programs are provided as code rather than semantically equivalent language descriptions. Secondly, LLMs can produce outputs for w/o IO programs directly, by implicitly evaluating the program within the forward pass, and more reliably when stepping through the program in-context via chain-of-thought. We further show that PBB leads to more robust evaluation of programs across inputs than training on I/O pairs drawn from a distribution that mirrors naturally occurring data. Our findings suggest a mechanism for enhanced reasoning through code training: it allows LLMs to internalise reusable algorithmic abstractions. Significant scope remains for future work to enable LLMs to more effectively learn from symbolic procedures, and progress in this direction opens other avenues like model alignment by training on formal constitutional principles.
Problem

Research questions and friction points this paper is trying to address.

Understanding how LLMs generalize reasoning through code training
Exploring Programming by Backprop for program evaluation without I/O examples
Investigating reusable algorithmic abstractions in LLMs via code training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training LLMs with source code only
Implicit program evaluation in forward pass
Chain-of-thought enhances program evaluation
🔎 Similar Papers
No similar papers found.