Guiding Generative Models to Uncover Diverse and Novel Crystals via Reinforcement Learning

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Addressing the conflict between likelihood-based sampling and goal-directed exploration in generative models for crystal material discovery—and the challenge of jointly optimizing novelty, thermodynamic stability, and structural diversity—this work proposes a reinforcement learning framework based on Population Relative Policy Optimization (PRPO). The method integrates a latent-space denoising diffusion model with a multi-objective reward mechanism to jointly enforce chemical validity, thermodynamic stability, and functionally oriented structural diversity. Key innovations include: (i) the first application of PRPO to materials generation, enabling efficient, targeted exploration of high-dimensional crystal configuration spaces; and (ii) the use of verifiable, differentiable reward functions that eliminate post-hoc filtering and substantially improve inverse design efficiency. Experiments yield multiple previously unreported, energetically stable crystal structures. Under strict chemical validity constraints, the generated structures achieve superior novelty and stability metrics compared to state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Discovering functional crystalline materials entails navigating an immense combinatorial design space. While recent advances in generative artificial intelligence have enabled the sampling of chemically plausible compositions and structures, a fundamental challenge remains: the objective misalignment between likelihood-based sampling in generative modelling and targeted focus on underexplored regions where novel compounds reside. Here, we introduce a reinforcement learning framework that guides latent denoising diffusion models toward diverse and novel, yet thermodynamically viable crystalline compounds. Our approach integrates group relative policy optimisation with verifiable, multi-objective rewards that jointly balance creativity, stability, and diversity. Beyond de novo generation, we demonstrate enhanced property-guided design that preserves chemical validity, while targeting desired functional properties. This approach establishes a modular foundation for controllable AI-driven inverse design that addresses the novelty-validity trade-off across scientific discovery applications of generative models.

Problem

Research questions and friction points this paper is trying to address.

Addressing misalignment between likelihood-based sampling and novel compound discovery

Guiding generative models to uncover diverse yet thermodynamically viable crystals

Resolving the novelty-validity trade-off in AI-driven inverse materials design

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning guides latent diffusion models

Group policy optimization balances multiple reward objectives

Modular AI framework enables controllable inverse design

🔎 Similar Papers

Crystalline Material Discovery in the Era of Artificial Intelligence