Can Large Language Models Reinvent Foundational Algorithms?

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) can independently reinvent classical computer science algorithms, thereby assessing their foundational capacity for innovation. To this end, the authors propose a “forgetting–reinvention” framework: first, specific algorithmic knowledge is removed from the model using an online policy-based approach grounded in GRPO, and then the model’s ability to reinvent those algorithms is evaluated in a controlled setting. This work presents the first systematic assessment of LLMs’ algorithm reinvention capabilities, revealing the critical roles of high-level prompting and generative verifiers in sustaining coherent reasoning. Notably, test-time reinforcement learning enables the successful reinvention of Strassen’s algorithm. Experiments show that the Qwen3-4B-Thinking-2507 model can reinvent 50% of target algorithms without prompting, with success rates rising to as high as 90% when enhanced with tailored prompts.
📝 Abstract
LLMs have shown strong potential to advance scientific discovery. Whether they possess the capacity for foundational innovation, however, remains an open question. In this work, we focus on a prerequisite for foundational innovation: can LLMs reinvent foundational algorithms in computer science? Our \textit{Unlearn-and-Reinvent} pipeline applies LLM unlearning to remove a specific foundational algorithm, such as Dijkstra's or Euclid's algorithm, from an LLM's pretrained knowledge, and then tests whether the model can reinvent it in a controlled environment. To enable effective unlearning, we adopt a GRPO-based, on-policy unlearning method. Across 10 target algorithms, 3 strong open-weight models, and 3 hint levels, our experiments demonstrate that (1) the strongest model Qwen3-4B-Thinking-2507 successfully reinvents 50% of the algorithms with no hint, 70% at hint level 1, and 90% at hint level 2; (2) a few high-level hints can enhance the reinvention success rate, but even step-by-step hints fail for those complicated algorithms; and (3) test-time reinforcement learning enables successful reinvention for the Strassen algorithm at hint level 2. Through analyses of output trajectories and ablation studies, we find that generative verifier in the reinvention phase plays a critical role in sustaining models' reasoning strength, helping to avoid the ``thought collapse'' phenomenon. These findings offer insights into both the potential and current limits of LLMs' innovative thinking.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
foundational algorithms
algorithm reinvention
scientific discovery
innovative thinking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unlearn-and-Reinvent
LLM unlearning
foundational algorithm reinvention
generative verifier
test-time reinforcement learning
🔎 Similar Papers
No similar papers found.
Jian Zhao
Jian Zhao
Zhongguancun Institute of Artificial Intelligence
Reinforcement LearningMulti-Agent System
H
Haoren Luo
Institute for Interdisciplinary Information Sciences, Tsinghua University
Y
Yu Wang
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Y
Yuhan Cao
Independent Researcher
P
Pingyue Sheng
Institute for Interdisciplinary Information Sciences, Tsinghua University
Tianxing He
Tianxing He
Tsinghua University
NLP