Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the fundamental impact of the choice of supervised learning subroutines (“oracles”) on the computational complexity of reinforcement learning. Focusing on two structured environments—Block MDPs and Low-Rank MDPs—the authors establish the first necessary and sufficient classification framework for oracle requirements. Under the standard sampling model, they prove that two-context regression is the minimal complete oracle for Block MDPs; under the reset model, single-context regression is nearly optimal, and they quantify the provable computational gain enabled by resets. Via cryptographic reduction, they further provide rigorous evidence that single-context regression is insufficient for Low-Rank MDPs. Collectively, these results demonstrate that assumptions about model access critically determine computational feasibility in RL, and they yield tight theoretical boundaries for supervised learning primitives—such as value or transition function estimation—underpinning sample-efficient policy learning.

Technology Category

Application Category

📝 Abstract
Algorithms for reinforcement learning (RL) in large state spaces crucially rely on supervised learning subroutines to estimate objects such as value functions or transition probabilities. Since only the simplest supervised learning problems can be solved provably and efficiently, practical performance of an RL algorithm depends on which of these supervised learning"oracles"it assumes access to (and how they are implemented). But which oracles are better or worse? Is there a minimal oracle? In this work, we clarify the impact of the choice of supervised learning oracle on the computational complexity of RL, as quantified by the oracle strength. First, for the task of reward-free exploration in Block MDPs in the standard episodic access model -- a ubiquitous setting for RL with function approximation -- we identify two-context regression as a minimal oracle, i.e. an oracle that is both necessary and sufficient (under a mild regularity assumption). Second, we identify one-context regression as a near-minimal oracle in the stronger reset access model, establishing a provable computational benefit of resets in the process. Third, we broaden our focus to Low-Rank MDPs, where we give cryptographic evidence that the analogous oracle from the Block MDP setting is insufficient.
Problem

Research questions and friction points this paper is trying to address.

Identify minimal supervised learning oracles
Assess oracle impact on RL complexity
Explore oracle sufficiency in different MDPs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-context regression as minimal oracle
One-context regression near-minimal with resets
Cryptographic evidence for Low-Rank MDP insufficiency
🔎 Similar Papers
No similar papers found.