🤖 AI Summary
This paper studies offline reinforcement learning in large-scale state spaces using only historical data, aiming to learn high-performance policies without environment interaction. Methodologically, it rigorously distinguishes between Bellman completeness and realizability—two distinct function approximation expressivity assumptions—and systematically characterizes data coverage conditions, including full-policy and single-policy coverage. It establishes a unified analytical framework linking coverage assumptions, algorithm design, and theoretical guarantees. Leveraging dynamic programming and statistical learning theory, the paper proposes several function-approximation-based offline RL algorithms and, for the first time under a unified perspective, derives tight sample and computational complexity bounds for them. The results precisely delineate the feasibility frontier of offline learning under various combinations of expressivity and coverage assumptions, thereby providing verifiable, theory-grounded guidance for algorithm selection and practical deployment.
📝 Abstract
This article introduces the theory of offline reinforcement learning in large state spaces, where good policies are learned from historical data without online interactions with the environment. Key concepts introduced include expressivity assumptions on function approximation (e.g., Bellman completeness vs. realizability) and data coverage (e.g., all-policy vs. single-policy coverage). A rich landscape of algorithms and results is described, depending on the assumptions one is willing to make and the sample and computational complexity guarantees one wishes to achieve. We also discuss open questions and connections to adjacent areas.