π€ AI Summary
This work addresses the lack of intrinsic goal-directed reasoning capabilities in current large language models by formulating reasoning as an optimal control problem. The authors propose embedding a Test-Time Control (TTC) layer into pretrained language models, enabling endogenous reasoning prior to prediction through finite-horizon Linear Quadratic Regulator (LQR) planning in the latent state space. Key innovations include the first integration of optimal control as an internal component of neural networks, the development of a hardware-efficient symplectic geometry-based LQR solver, and the implementation of low-overhead parallel inference via CUDA-fused kernels. Experimental results demonstrate substantial improvements: a 27.8% gain in mathematical reasoning performance on MATH-500 and 2β3Γ enhancements in Pass@8 scores on the AMC and AIME benchmarks.
π Abstract
Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode. While prior work uses reinforcement learning or test-time training, planning remains external to the model architecture. We formulate reasoning as optimal control and introduce the Test-Time Control (TTC) layer, which performs finite-horizon LQR planning over latent states at inference time, represents a value function within neural architectures, and leverages it as the nested objective to enable planning before prediction. To ensure scalability, we derive a hardware-efficient LQR solver based on a symplectic formulation and implement it as a fused CUDA kernel, enabling parallel execution with minimal overhead. Integrated as an adapter into pretrained LLMs, TTC layers improve mathematical reasoning performance by up to +27.8% on MATH-500 and 2-3x Pass@8 improvements on AMC and AIME, demonstrating that embedding optimal control as an architectural component provides an effective and scalable mechanism for reasoning beyond test-time training.