🤖 AI Summary
This work addresses the inefficiency of reinforcement learning agents, whose computational cost does not decrease with improved performance. We propose a novel agent framework capable of autonomously optimizing its computational resource usage. Its core innovation lies in introducing, for the first time during training, a differentiable computational cost model coupled with a dynamic computation control mechanism—enabling agents to explicitly perceive and actively regulate their own computational overhead. Experiments on the Arcade Learning Environment demonstrate that, under identical training budgets, our method reduces average computational consumption by a factor of three and outperforms baseline methods in 75% of test games. The approach achieves human-like computational adaptivity—reducing cognitive load with increasing proficiency—and establishes a new paradigm for energy-efficient, multi-task-capable intelligent agents.
📝 Abstract
While reinforcement learning agents can achieve superhuman performance in many complex tasks, they typically do not become more computationally efficient as they improve. In contrast, humans gradually require less cognitive effort as they become more proficient at a task. If agents could reason about their compute as they learn, could they similarly reduce their computation footprint? If they could, we could have more energy efficient agents or free up compute cycles for other processes like planning. In this paper, we experiment with showing agents the cost of their computation and giving them the ability to control when they use compute. We conduct our experiments on the Arcade Learning Environment, and our results demonstrate that with the same training compute budget, agents that reason about their compute perform better on 75% of games. Furthermore, these agents use three times less compute on average. We analyze individual games and show where agents gain these efficiencies.