Benchmarking Large Language Models for Polymer Property Predictions

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional polymer property prediction methods rely heavily on large-scale labeled datasets, handcrafted features, and domain-specific molecular representations, limiting generalizability and scalability. Method: This study presents the first systematic evaluation of general-purpose large language models (LLMs)—specifically LLaMA-3-8B and GPT-3.5—for predicting polymer thermal properties (glass transition, melting, and decomposition temperatures). We fine-tuned both models on a dataset of 11,740 polymers using parameter-efficient fine-tuning (PEFT), hyperparameter optimization, natural-language input encoding, and molecular embedding analysis, benchmarking against Polymer Genome, polyGNN, and polyBERT. Contribution/Results: LLaMA-3-8B achieves state-of-the-art performance among LLMs under single-task settings, attaining accuracy comparable to—but slightly below—the best traditional models; GPT-3.5 performs significantly worse. Open-source LLMs demonstrate superior tunability, yet general-purpose LLMs still struggle to capture fine-grained chemical structural details critical for precise thermal property prediction.

Technology Category

Application Category

📝 Abstract
Machine learning has revolutionized polymer science by enabling rapid property prediction and generative design. Large language models (LLMs) offer further opportunities in polymer informatics by simplifying workflows that traditionally rely on large labeled datasets, handcrafted representations, and complex feature engineering. LLMs leverage natural language inputs through transfer learning, eliminating the need for explicit fingerprinting and streamlining training. In this study, we finetune general purpose LLMs -- open-source LLaMA-3-8B and commercial GPT-3.5 -- on a curated dataset of 11,740 entries to predict key thermal properties: glass transition, melting, and decomposition temperatures. Using parameter-efficient fine-tuning and hyperparameter optimization, we benchmark these models against traditional fingerprinting-based approaches -- Polymer Genome, polyGNN, and polyBERT -- under single-task (ST) and multi-task (MT) learning. We find that while LLM-based methods approach traditional models in performance, they generally underperform in predictive accuracy and efficiency. LLaMA-3 consistently outperforms GPT-3.5, likely due to its tunable open-source architecture. Additionally, ST learning proves more effective than MT, as LLMs struggle to capture cross-property correlations, a key strength of traditional methods. Analysis of molecular embeddings reveals limitations of general purpose LLMs in representing nuanced chemo-structural information compared to handcrafted features and domain-specific embeddings. These findings provide insight into the interplay between molecular embeddings and natural language processing, guiding LLM selection for polymer informatics.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for predicting polymer thermal properties
Comparing LLM performance with traditional fingerprinting methods
Assessing LLM limitations in capturing polymer structural nuances
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tune LLMs for polymer property prediction
Use transfer learning to avoid fingerprinting
Benchmark LLMs against traditional fingerprinting methods
🔎 Similar Papers
No similar papers found.
Sonakshi Gupta
Sonakshi Gupta
Georgia Institute of Technology
NLPLLMPolymersAI for materials
A
Akhlak-Ul Mahmood
School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta 30332, GA, USA.
S
Shivank Shukla
School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta 30332, GA, USA.
R
R. Ramprasad
School of Materials Science and Engineering, Georgia Institute of Technology, 771 Ferst Drive NW, Atlanta 30332, GA, USA.