🤖 AI Summary
Low efficiency in crystal structure generation for inorganic materials design, coupled with the difficulty of simultaneously achieving thermodynamic stability and target-oriented functionality. Method: We propose a reinforcement learning framework integrating Wyckoff-position-based textual representation and direct preference optimization (DPO), explicitly encoding space-group symmetry into language model inputs and enabling preference-aligned conditional and unconditional generation on Qwen-2.5-7B. Contribution/Results: Our approach breaks from conventional trial-and-error paradigms, significantly enhancing the generation of thermodynamically stable, novel, and functionally targeted structures: unconditional success rate improves by 115%, and space-group-constrained generation improves by 50%. Both stability and novelty surpass current state-of-the-art methods. This work provides the first systematic validation that large language models—when guided by structured structural priors—exhibit high efficiency and reliability in materials inverse design.
📝 Abstract
Discovering novel materials is critical for technological advancements such as solar cells, batteries, and carbon capture. However, the development of new materials is constrained by a slow and expensive trial-and-error process. To accelerate this pipeline, we introduce PLaID++, a Large Language Model (LLM) fine-tuned for stable and property-guided crystal generation. We fine-tune Qwen-2.5 7B to generate crystal structures using a novel Wyckoff-based text representation. We show that generation can be effectively guided with a reinforcement learning technique based on Direct Preference Optimization (DPO), with sampled structures categorized by their stability, novelty, and space group. By encoding symmetry constraints directly into text and guiding model outputs towards desirable chemical space, PLaID++ generates structures that are thermodynamically stable, unique, and novel at a $sim$50% greater rate than prior methods and conditionally generates structures with desired space group properties. Our experiments highlight the effectiveness of iterative DPO, achieving $sim$115% and $sim$50% improvements in unconditional and space group conditioned generation, respectively, compared to fine-tuning alone. Our work demonstrates the potential of adapting post-training techniques from natural language processing to materials design, paving the way for targeted and efficient discovery of novel materials.