Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling

📅 2025-08-22

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work investigates the decoupling of multi-step reasoning capability from memory effects in large language models (LLMs), aiming to isolate and quantify pure reasoning depth. To eliminate semantic and long-term memory confounds, we introduce a controllable benchmark framework grounded in cellular automata, where state sequences are generated via random Boolean functions—ensuring no semantic dependencies or memorizable patterns. Within this setting, we systematically evaluate how recursive architectural designs, explicit memory mechanisms, and test-time compute scaling jointly influence effective reasoning depth. Empirical results demonstrate that their synergistic integration substantially improves generalization on multi-step state prediction tasks; moreover, reasoning accuracy increases monotonically with effective depth. This study establishes a novel causal analysis paradigm for LLM reasoning and provides empirical grounding for its controlled enhancement—advancing both theoretical understanding and practical design of reasoning-capable foundation models.

Technology Category

Application Category

📝 Abstract

Reasoning is a core capability of large language models, yet understanding how they learn and perform multi-step reasoning remains an open problem. In this study, we explore how different architectures and training methods affect model multi-step reasoning capabilities within a cellular automata framework. By training on state sequences generated with random Boolean functions for random initial conditions to exclude memorization, we demonstrate that most neural architectures learn to abstract the underlying rules. While models achieve high accuracy in next-state prediction, their performance declines sharply if multi-step reasoning is required. We confirm that increasing model depth plays a crucial role for sequential computations. We demonstrate that an extension of the effective model depth with recurrence, memory, and test-time compute scaling substantially enhances reasoning capabilities.

Problem

Research questions and friction points this paper is trying to address.

Investigating neural architectures' multi-step reasoning decline

Extending model depth via recurrence, memory, and compute scaling

Enhancing reasoning beyond memorization with cellular automata framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extending model depth with recurrence

Enhancing reasoning using memory mechanisms

Scaling test-time compute for performance

🔎 Similar Papers

FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering