RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address the challenge of simultaneously achieving high expected return and low tail risk in safety-critical offline reinforcement learning, this paper proposes RAMAC—a novel risk-aware framework. RAMAC introduces expressive generative actors based on diffusion models and flow matching into risk-sensitive offline RL for the first time, jointly optimizing with a distributionally robust critic and behavior cloning constraints to enable differentiable, end-to-end risk-aware policy learning. Unlike conventional conservative policy methods, RAMAC achieves superior trade-offs between performance and safety. Empirical evaluation on Stochastic-D4RL benchmarks demonstrates that RAMAC improves CVaR₀.₁ by 12–35% across most tasks while maintaining state-of-the-art average returns. These results validate its effectiveness and safety in complex, multimodal environments.

Technology Category

Application Category

📝 Abstract

In safety-critical domains where online data collection is infeasible, offline reinforcement learning (RL) offers an attractive alternative but only if policies deliver high returns without incurring catastrophic lower-tail risk. Prior work on risk-averse offline RL achieves safety at the cost of value conservatism and restricted policy classes, whereas expressive policies are only used in risk-neutral settings. Here, we address this gap by introducing the extbf{Risk-Aware Multimodal Actor-Critic (RAMAC)} framework, which couples an emph{expressive generative actor} with a distributional critic. The RAMAC differentiates composite objective combining distributional risk and BC loss through the generative path, achieving risk-sensitive learning in complex multimodal scenarios. We instantiate RAMAC with diffusion and flow-matching actors and observe consistent gains in $mathrm{CVaR}_{0.1}$ while maintaining strong returns on most Stochastic-D4RL tasks. Code: https://github.com/KaiFukazawa/RAMAC.git

Problem

Research questions and friction points this paper is trying to address.

Addresses risk-averse offline reinforcement learning limitations

Combines expressive generative policies with distributional risk objectives

Achieves safety without sacrificing performance in multimodal scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expressive generative actor with distributional critic

Differentiates composite objective through generative path

Uses diffusion and flow-matching actors

🔎 Similar Papers

Uncertainty-aware Distributional Offline Reinforcement Learning