Benchmarking Large Language Models for Polymer Property Predictions

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Traditional polymer property prediction methods rely heavily on large-scale labeled datasets, handcrafted features, and domain-specific molecular representations, limiting generalizability and scalability. Method: This study presents the first systematic evaluation of general-purpose large language models (LLMs)—specifically LLaMA-3-8B and GPT-3.5—for predicting polymer thermal properties (glass transition, melting, and decomposition temperatures). We fine-tuned both models on a dataset of 11,740 polymers using parameter-efficient fine-tuning (PEFT), hyperparameter optimization, natural-language input encoding, and molecular embedding analysis, benchmarking against Polymer Genome, polyGNN, and polyBERT. Contribution/Results: LLaMA-3-8B achieves state-of-the-art performance among LLMs under single-task settings, attaining accuracy comparable to—but slightly below—the best traditional models; GPT-3.5 performs significantly worse. Open-source LLMs demonstrate superior tunability, yet general-purpose LLMs still struggle to capture fine-grained chemical structural details critical for precise thermal property prediction.

Technology Category

Application Category

📝 Abstract

Machine learning has revolutionized polymer science by enabling rapid property prediction and generative design. Large language models (LLMs) offer further opportunities in polymer informatics by simplifying workflows that traditionally rely on large labeled datasets, handcrafted representations, and complex feature engineering. LLMs leverage natural language inputs through transfer learning, eliminating the need for explicit fingerprinting and streamlining training. In this study, we finetune general purpose LLMs -- open-source LLaMA-3-8B and commercial GPT-3.5 -- on a curated dataset of 11,740 entries to predict key thermal properties: glass transition, melting, and decomposition temperatures. Using parameter-efficient fine-tuning and hyperparameter optimization, we benchmark these models against traditional fingerprinting-based approaches -- Polymer Genome, polyGNN, and polyBERT -- under single-task (ST) and multi-task (MT) learning. We find that while LLM-based methods approach traditional models in performance, they generally underperform in predictive accuracy and efficiency. LLaMA-3 consistently outperforms GPT-3.5, likely due to its tunable open-source architecture. Additionally, ST learning proves more effective than MT, as LLMs struggle to capture cross-property correlations, a key strength of traditional methods. Analysis of molecular embeddings reveals limitations of general purpose LLMs in representing nuanced chemo-structural information compared to handcrafted features and domain-specific embeddings. These findings provide insight into the interplay between molecular embeddings and natural language processing, guiding LLM selection for polymer informatics.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for predicting polymer thermal properties

Comparing LLM performance with traditional fingerprinting methods

Assessing LLM limitations in capturing polymer structural nuances

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tune LLMs for polymer property prediction

Use transfer learning to avoid fingerprinting

Benchmark LLMs against traditional fingerprinting methods

🔎 Similar Papers

MatText: Do Language Models Need More than Text & Scale for Materials Modeling?