Preference Learning from Physics-Based Feedback: Tuning Language Models to Design BCC/B2 Superalloys

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the poor synthesizability of BCC/B2-type superalloys and the empirical reliance of conventional design approaches. We propose a novel, physics-guided paradigm for large language model (LLM) preference optimization. Methodologically, we pioneer the integration of scientific reward signals—derived from thermodynamic phase stability calculations (e.g., CALPHAD)—into Direct Preference Optimization (DPO), replacing human feedback. This framework is implemented across open-source LLMs including LLaMA-3.1, Gemma-2, and OLMo-2 to enable multi-objective co-optimization. Our key contributions are: (i) the first scalable, physics-consistent unified reward framework tailored for materials design; and (ii) substantial improvements in both predicted phase stability and experimental synthesizability of generated alloys. The approach establishes a generalizable methodology for intelligent, physics-informed design in physical sciences.

Technology Category

Application Category

📝 Abstract
We apply preference learning to the task of language model-guided design of novel structural alloys. In contrast to prior work that focuses on generating stable inorganic crystals, our approach targets the synthesizeability of a specific structural class: BCC/B2 superalloys, an underexplored family of materials with potential applications in extreme environments. Using three open-weight models (LLaMA-3.1, Gemma-2, and OLMo-2), we demonstrate that language models can be optimized for multiple design objectives using a single, unified reward signal through Direct Preference Optimization (DPO). Unlike prior approaches that rely on heuristic or human-in-the-loop feedback (costly), our reward signal is derived from thermodynamic phase calculations, offering a scientifically grounded criterion for model tuning. To our knowledge, this is the first demonstration of preference-tuning a language model using physics-grounded feedback for structural alloy design. The resulting framework is general and extensible, providing a path forward for intelligent design-space exploration across a range of physical science domains.
Problem

Research questions and friction points this paper is trying to address.

Optimizing language models for multi-objective BCC/B2 superalloy design
Using physics-based thermodynamic feedback instead of human evaluation
Developing general framework for materials discovery across physical sciences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference learning optimizes language models for alloy design
Physics-based feedback replaces human or heuristic tuning methods
Direct Preference Optimization unifies multiple design objectives
🔎 Similar Papers
No similar papers found.
S
Satanu Ghosh
Department of Computer Science, University of New Hampshire
C
Collin Holgate
Materials Department, University of California, Santa Barbara
N
Neal R. Brodnik
Materials Department, University of California, Santa Barbara
D
Doug Downey
Allen Institute for Artificial Intelligence
S
Samantha Daly
Department of Mechanical Engineering, University of California, Santa Barbara
T
Tresa M. Pollock
Materials Department, University of California, Santa Barbara
Samuel Carton
Samuel Carton
Assistant Professor, University of New Hampshire
Computer science