🤖 AI Summary
Addressing the conflict between likelihood-based sampling and goal-directed exploration in generative models for crystal material discovery—and the challenge of jointly optimizing novelty, thermodynamic stability, and structural diversity—this work proposes a reinforcement learning framework based on Population Relative Policy Optimization (PRPO). The method integrates a latent-space denoising diffusion model with a multi-objective reward mechanism to jointly enforce chemical validity, thermodynamic stability, and functionally oriented structural diversity. Key innovations include: (i) the first application of PRPO to materials generation, enabling efficient, targeted exploration of high-dimensional crystal configuration spaces; and (ii) the use of verifiable, differentiable reward functions that eliminate post-hoc filtering and substantially improve inverse design efficiency. Experiments yield multiple previously unreported, energetically stable crystal structures. Under strict chemical validity constraints, the generated structures achieve superior novelty and stability metrics compared to state-of-the-art baselines.
📝 Abstract
Discovering functional crystalline materials entails navigating an immense combinatorial design space. While recent advances in generative artificial intelligence have enabled the sampling of chemically plausible compositions and structures, a fundamental challenge remains: the objective misalignment between likelihood-based sampling in generative modelling and targeted focus on underexplored regions where novel compounds reside. Here, we introduce a reinforcement learning framework that guides latent denoising diffusion models toward diverse and novel, yet thermodynamically viable crystalline compounds. Our approach integrates group relative policy optimisation with verifiable, multi-objective rewards that jointly balance creativity, stability, and diversity. Beyond de novo generation, we demonstrate enhanced property-guided design that preserves chemical validity, while targeting desired functional properties. This approach establishes a modular foundation for controllable AI-driven inverse design that addresses the novelty-validity trade-off across scientific discovery applications of generative models.