Actor-Critic without Actor

๐Ÿ“… 2025-09-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Traditional actor-critic methods rely on separate actor and critic networks, resulting in architecture sensitivity, cumbersome hyperparameter tuning, and poor scalability; while diffusion-based policies offer multimodal action representation, they incur high computational overhead and suffer from deployment inefficiency. This paper proposes a lightweight online reinforcement learning framework that **eliminates the explicit actor network entirely**, instead constructing a noise-aware critic and directly generating actions via its gradient fieldโ€”thereby tightly coupling policy improvement with value estimation. Our approach requires neither diffusion models nor auxiliary samplers; action generation is guided solely by critic gradients, preserving multimodal behavioral modeling while drastically reducing algorithmic complexity and computational cost. Evaluated on standard online RL benchmarks, the method achieves faster convergence and superior performance over classical actor-critic and state-of-the-art diffusion-based policy methods, demonstrating both conceptual simplicity and empirical effectiveness.

Technology Category

Application Category

๐Ÿ“ Abstract
Actor-critic methods constitute a central paradigm in reinforcement learning (RL), coupling policy evaluation with policy improvement. While effective across many domains, these methods rely on separate actor and critic networks, which makes training vulnerable to architectural decisions and hyperparameter tuning. Such complexity limits their scalability in settings that require large function approximators. Recently, diffusion models have recently been proposed as expressive policies that capture multi-modal behaviors and improve exploration, but they introduce additional design choices and computational burdens, hindering efficient deployment. We introduce Actor-Critic without Actor (ACA), a lightweight framework that eliminates the explicit actor network and instead generates actions directly from the gradient field of a noise-level critic. This design removes the algorithmic and computational overhead of actor training while keeping policy improvement tightly aligned with the critic's latest value estimates. Moreover, ACA retains the ability to capture diverse, multi-modal behaviors without relying on diffusion-based actors, combining simplicity with expressiveness. Through extensive experiments on standard online RL benchmarks,ACA achieves more favorable learning curves and competitive performance compared to both standard actor-critic and state-of-the-art diffusion-based methods, providing a simple yet powerful solution for online RL.
Problem

Research questions and friction points this paper is trying to address.

Eliminates separate actor networks to reduce architectural complexity in RL
Avoids computational burdens of diffusion models while maintaining policy expressiveness
Addresses scalability limitations of traditional actor-critic methods with large approximators
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates actions from critic's gradient field
Eliminates explicit actor network entirely
Combines simplicity with multi-modal behavior capability