π€ AI Summary
To address the challenge of simultaneously achieving high expected return and low tail risk in safety-critical offline reinforcement learning, this paper proposes RAMACβa novel risk-aware framework. RAMAC introduces expressive generative actors based on diffusion models and flow matching into risk-sensitive offline RL for the first time, jointly optimizing with a distributionally robust critic and behavior cloning constraints to enable differentiable, end-to-end risk-aware policy learning. Unlike conventional conservative policy methods, RAMAC achieves superior trade-offs between performance and safety. Empirical evaluation on Stochastic-D4RL benchmarks demonstrates that RAMAC improves CVaRβ.β by 12β35% across most tasks while maintaining state-of-the-art average returns. These results validate its effectiveness and safety in complex, multimodal environments.
π Abstract
In safety-critical domains where online data collection is infeasible, offline reinforcement learning (RL) offers an attractive alternative but only if policies deliver high returns without incurring catastrophic lower-tail risk. Prior work on risk-averse offline RL achieves safety at the cost of value conservatism and restricted policy classes, whereas expressive policies are only used in risk-neutral settings. Here, we address this gap by introducing the extbf{Risk-Aware Multimodal Actor-Critic (RAMAC)} framework, which couples an emph{expressive generative actor} with a distributional critic. The RAMAC differentiates composite objective combining distributional risk and BC loss through the generative path, achieving risk-sensitive learning in complex multimodal scenarios. We instantiate RAMAC with diffusion and flow-matching actors and observe consistent gains in $mathrm{CVaR}_{0.1}$ while maintaining strong returns on most Stochastic-D4RL tasks. Code: https://github.com/KaiFukazawa/RAMAC.git