RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization

πŸ“… 2025-10-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenge of simultaneously achieving high expected return and low tail risk in safety-critical offline reinforcement learning, this paper proposes RAMACβ€”a novel risk-aware framework. RAMAC introduces expressive generative actors based on diffusion models and flow matching into risk-sensitive offline RL for the first time, jointly optimizing with a distributionally robust critic and behavior cloning constraints to enable differentiable, end-to-end risk-aware policy learning. Unlike conventional conservative policy methods, RAMAC achieves superior trade-offs between performance and safety. Empirical evaluation on Stochastic-D4RL benchmarks demonstrates that RAMAC improves CVaRβ‚€.₁ by 12–35% across most tasks while maintaining state-of-the-art average returns. These results validate its effectiveness and safety in complex, multimodal environments.

Technology Category

Application Category

πŸ“ Abstract
In safety-critical domains where online data collection is infeasible, offline reinforcement learning (RL) offers an attractive alternative but only if policies deliver high returns without incurring catastrophic lower-tail risk. Prior work on risk-averse offline RL achieves safety at the cost of value conservatism and restricted policy classes, whereas expressive policies are only used in risk-neutral settings. Here, we address this gap by introducing the extbf{Risk-Aware Multimodal Actor-Critic (RAMAC)} framework, which couples an emph{expressive generative actor} with a distributional critic. The RAMAC differentiates composite objective combining distributional risk and BC loss through the generative path, achieving risk-sensitive learning in complex multimodal scenarios. We instantiate RAMAC with diffusion and flow-matching actors and observe consistent gains in $mathrm{CVaR}_{0.1}$ while maintaining strong returns on most Stochastic-D4RL tasks. Code: https://github.com/KaiFukazawa/RAMAC.git
Problem

Research questions and friction points this paper is trying to address.

Addresses risk-averse offline reinforcement learning limitations
Combines expressive generative policies with distributional risk objectives
Achieves safety without sacrificing performance in multimodal scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expressive generative actor with distributional critic
Differentiates composite objective through generative path
Uses diffusion and flow-matching actors
πŸ”Ž Similar Papers
K
Kai Fukazawa
Department of Mechanical and Aerospace Engineering, University of California, Davis
K
Kunal Mundada
Department of Computer Science, University of California, Davis
Iman Soltani
Iman Soltani
Assistant Professor of Mechanical and Aerospace Engineering, University of California, Davis
RoboticsAutonomous DrivingDeep Learning for Medical DiagnosisInstrumentation and