Algorithmic Language Models with Neurally Compiled Libraries

📅 2024-07-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Large language models (LLMs) inherently lack native algorithmic execution and generalization capabilities due to architectural and training paradigm limitations. To address this, we propose NeuroCompiler—a framework that embeds differentiable computational primitives (memory, registers, elementary operations, and adaptive recursion) into the LLaMA3 architecture, enabling end-to-end compilation of classical algorithms into gradient-optimizable neural libraries. Crucially, it natively integrates computational abstractions—such as loops and state updates—into the Transformer backbone, supporting joint optimization over deeply variable algorithmic tasks. Our method comprises an enhanced Transformer, a differentiable storage system, a neural compiler, and a lightweight algorithmic fine-tuning strategy. Empirical evaluation on sequence reversal, counting, and stack simulation demonstrates substantial improvements in out-of-distribution generalization and reasoning robustness. These results validate the feasibility and effectiveness of the neural compilation paradigm in endowing LLMs with principled algorithmic reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Important tasks such as reasoning and planning are fundamentally algorithmic, meaning that solving them robustly requires acquiring true reasoning or planning algorithms, rather than shortcuts. Large Language Models lack true algorithmic ability primarily because of the limitations of neural network optimization algorithms, their optimization data and optimization objective, but also due to architectural inexpressivity. To solve this, our paper proposes augmenting LLMs with a library of fundamental operations and sophisticated differentiable programs, so that common algorithms do not need to be learned from scratch. We add memory, registers, basic operations, and adaptive recurrence to a transformer architecture built on LLaMA3. Then, we define a method for directly compiling algorithms into a differentiable starting library, which is used natively and propagates gradients for optimization. In this preliminary study, we explore the feasability of augmenting LLaMA3 with a differentiable computer, for instance by fine-tuning small transformers on simple algorithmic tasks with variable computational depth.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs with algorithmic reasoning capabilities

Addressing neural networks' limitations in learning algorithms

Compiling algorithms into differentiable libraries for optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augmenting LLMs with fundamental operations library

Adding memory, registers, and adaptive recurrence

Compiling algorithms into differentiable starting library

🔎 Similar Papers

Small Language Models Also Work With Small Vocabularies: Probing the Linguistic Abilities of Grapheme- and Phoneme-Based Baby Llamas