Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This study addresses the poor synthesizability of BCC/B2-type superalloys and the empirical reliance of conventional design approaches. We propose a novel, physics-guided paradigm for large language model (LLM) preference optimization. Methodologically, we pioneer the integration of scientific reward signals—derived from thermodynamic phase stability calculations (e.g., CALPHAD)—into Direct Preference Optimization (DPO), replacing human feedback. This framework is implemented across open-source LLMs including LLaMA-3.1, Gemma-2, and OLMo-2 to enable multi-objective co-optimization. Our key contributions are: (i) the first scalable, physics-consistent unified reward framework tailored for materials design; and (ii) substantial improvements in both predicted phase stability and experimental synthesizability of generated alloys. The approach establishes a generalizable methodology for intelligent, physics-informed design in physical sciences.

Technology Category

Application Category

📝 Abstract

We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with potential applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple design objectives using a single, unified reward signal through Direct Preference Optimization (DPO). Unlike prior approaches that rely on heuristic or human-in-the-loop feedback (costly), our reward signal is derived from thermodynamic phase calculations, offering a scientifically grounded criterion for model tuning. To our knowledge, this is the first demonstration of preference-tuning a language model using physics-grounded feedback for structural alloy design. The resulting framework is general and extensible, providing a path forward for intelligent design-space exploration across a range of physical science domains.

Problem

Research questions and friction points this paper is trying to address.

Optimizing language models for multi-objective BCC/B2 superalloy design

Using physics-based thermodynamic feedback instead of human evaluation

Developing general framework for materials discovery across physical sciences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference learning optimizes language models for alloy design

Physics-based feedback replaces human or heuristic tuning methods

Direct Preference Optimization unifies multiple design objectives

🔎 Similar Papers

MatText: Do Language Models Need More than Text & Scale for Materials Modeling?