Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the lack of intrinsic goal-directed reasoning capabilities in current large language models by formulating reasoning as an optimal control problem. The authors propose embedding a Test-Time Control (TTC) layer into pretrained language models, enabling endogenous reasoning prior to prediction through finite-horizon Linear Quadratic Regulator (LQR) planning in the latent state space. Key innovations include the first integration of optimal control as an internal component of neural networks, the development of a hardware-efficient symplectic geometry-based LQR solver, and the implementation of low-overhead parallel inference via CUDA-fused kernels. Experimental results demonstrate substantial improvements: a 27.8% gain in mathematical reasoning performance on MATH-500 and 2–3× enhancements in Pass@8 scores on the AMC and AIME benchmarks.

Technology Category

Application Category

📝 Abstract

Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode. While prior work uses reinforcement learning or test-time training, planning remains external to the model architecture. We formulate reasoning as optimal control and introduce the Test-Time Control (TTC) layer, which performs finite-horizon LQR planning over latent states at inference time, represents a value function within neural architectures, and leverages it as the nested objective to enable planning before prediction. To ensure scalability, we derive a hardware-efficient LQR solver based on a symplectic formulation and implement it as a fused CUDA kernel, enabling parallel execution with minimal overhead. Integrated as an adapter into pretrained LLMs, TTC layers improve mathematical reasoning performance by up to +27.8% on MATH-500 and 2-3x Pass@8 improvements on AMC and AIME, demonstrating that embedding optimal control as an architectural component provides an effective and scalable mechanism for reasoning beyond test-time training.

Problem

Research questions and friction points this paper is trying to address.

reasoning

optimal control

language models

planning

associative memory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-Time Control

Optimal Control

Hardware-Efficient LQR