Open Materials Generation with Inference-Time Reinforcement Learning

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the challenge of explicitly guiding crystal structure generation toward optimizing target physical properties—such as energy—during inference in continuous-time generative models. To this end, we introduce reinforcement learning into crystal structure prediction for the first time, proposing a policy gradient–based framework that directly operates on the velocity field learned by a flow-matching model. By incorporating stochastic dynamical perturbations, our approach enables property-guided generation and efficient exploration without requiring explicit score function computation. The method adaptively learns a time-dependent velocity annealing schedule, achieving significant energy optimization while preserving structural diversity. Empirically, it improves sampling efficiency by an order of magnitude and matches or even surpasses score-based reinforcement learning approaches in both generation speed and performance.

Technology Category

Application Category

📝 Abstract

Continuous-time generative models for crystalline materials enable inverse materials design by learning to predict stable crystal structures, but incorporating explicit target properties into the generative process remains challenging. Policy-gradient reinforcement learning (RL) provides a principled mechanism for aligning generative models with downstream objectives but typically requires access to the score, which has prevented its application to flow-based models that learn only velocity fields. We introduce Open Materials Generation with Inference-time Reinforcement Learning (OMatG-IRL), a policy-gradient RL framework that operates directly on the learned velocity fields and eliminates the need for the explicit computation of the score. OMatG-IRL leverages stochastic perturbations of the underlying generation dynamics preserving the baseline performance of the pretrained generative model while enabling exploration and policy-gradient estimation at inference time. Using OMatG-IRL, we present the first application of RL to crystal structure prediction (CSP). Our method enables effective reinforcement of an energy-based objective while preserving diversity through composition conditioning, and it achieves performance competitive with score-based RL approaches. Finally, we show that OMatG-IRL can learn time-dependent velocity-annealing schedules, enabling accurate CSP with order-of-magnitude improvements in sampling efficiency and, correspondingly, reduction in generation time.

Problem

Research questions and friction points this paper is trying to address.

inverse materials design

crystal structure prediction

reinforcement learning

generative models

target property optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

inference-time reinforcement learning

flow-based generative models

crystal structure prediction