🤖 AI Summary
Addressing the challenges of generalization and hyperparameter-free operation in reinforcement learning (RL), this paper introduces MR.Q—a model-free, task-agnostic RL algorithm that requires no environment model or task-specific hyperparameters. Methodologically, MR.Q is the first to integrate model-based feature representations into a model-free framework, enabling linear value function approximation that effectively incorporates dense reward signals without explicit planning or trajectory simulation. This design enhances policy generalization while avoiding computational overhead associated with model-based methods. Empirically, MR.Q employs a single, fixed hyperparameter configuration across diverse standard benchmarks—including Atari, DeepMind Control Suite, and Locomotion—and achieves performance competitive with both specialized algorithms and leading general-purpose baselines. This work provides the first empirical validation that high generalization, zero-shot hyperparameter adaptation, and truly model-free universal RL are simultaneously attainable.
📝 Abstract
Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored to specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.