🤖 AI Summary
This work proposes AIQI, the first model-free universal reinforcement learning agent proven to achieve strong asymptotic ε-optimality and asymptotic ε-Bayes-optimality. While no existing model-free agent has been theoretically guaranteed asymptotic optimality in the general reinforcement learning setting, AIQI overcomes this limitation by employing a distributed action-value function representation coupled with a Q-based induction mechanism for decision-making. Under the “truth granularity” condition, AIQI rigorously guarantees both forms of asymptotic near-optimality. This result breaks from the conventional paradigm of universal agents that rely on explicit environment models, thereby substantially expanding the theoretical foundations and practical potential of model-free approaches in universal reinforcement learning.
📝 Abstract
In general reinforcement learning, all established optimal agents, including AIXI, are model-based, explicitly maintaining and using environment models. This paper introduces Universal AI with Q-Induction (AIQI), the first model-free agent proven to be asymptotically $\varepsilon$-optimal in general RL. AIQI performs universal induction over distributional action-value functions, instead of policies or environments like previous works. Under a grain of truth condition, we prove that AIQI is strong asymptotically $\varepsilon$-optimal and asymptotically $\varepsilon$-Bayes-optimal. Our results significantly expand the diversity of known universal agents.