On Computation and Reinforcement Learning

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the impact of computational budget on the performance of reinforcement learning policies and introduces a formal framework for "computation-constrained policies," which, for the first time in reinforcement learning, decouples computation from model parameter count. To this end, the authors design a lightweight dynamic computation architecture that, under a fixed parameter budget, adaptively allocates additional computational resources during inference to enhance policy performance and generalization. The approach integrates algorithmic learning theory with model-free planning and is applicable to both online and offline settings. Experiments across 31 tasks demonstrate that increasing computational expenditure alone yields significant performance gains, with the method outperforming standard feedforward or residual networks—despite using up to five times fewer parameters—particularly in long-horizon tasks.

Technology Category

Application Category

📝 Abstract
How does the amount of compute available to a reinforcement learning (RL) policy affect its learning? Can policies using a fixed amount of parameters, still benefit from additional compute? The standard RL framework does not provide a language to answer these questions formally. Empirically, deep RL policies are often parameterized as neural networks with static architectures, conflating the amount of compute and the number of parameters. In this paper, we formalize compute bounded policies and prove that policies which use more compute can solve problems and generalize to longer-horizon tasks that are outside the scope of policies with less compute. Building on prior work in algorithmic learning and model-free planning, we propose a minimal architecture that can use a variable amount of compute. Our experiments complement our theory. On a set 31 different tasks spanning online and offline RL, we show that $(1)$ this architecture achieves stronger performance simply by using more compute, and $(2)$ stronger generalization on longer-horizon test tasks compared to standard feedforward networks or deep residual network using up to 5 times more parameters.
Problem

Research questions and friction points this paper is trying to address.

reinforcement learning
compute
generalization
policy
long-horizon tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

compute-bounded policies
variable compute
generalization in RL
algorithmic learning
model-free planning
R
Raj Ghugare
Department of Computer Science, Princeton University
M
Michal Bortkiewicz
Department of Computer Science, Princeton University; Warsaw University of Technology
Alicja Ziarko
Alicja Ziarko
University of Warsaw, Ideas NCBR, Institute of Mathematics of the Polish Academy of Sciences
Benjamin Eysenbach
Benjamin Eysenbach
Princeton University
Reinforcement Learning