🤖 AI Summary
Traditional polymer property prediction methods rely heavily on large-scale labeled datasets, handcrafted features, and domain-specific molecular representations, limiting generalizability and scalability.
Method: This study presents the first systematic evaluation of general-purpose large language models (LLMs)—specifically LLaMA-3-8B and GPT-3.5—for predicting polymer thermal properties (glass transition, melting, and decomposition temperatures). We fine-tuned both models on a dataset of 11,740 polymers using parameter-efficient fine-tuning (PEFT), hyperparameter optimization, natural-language input encoding, and molecular embedding analysis, benchmarking against Polymer Genome, polyGNN, and polyBERT.
Contribution/Results: LLaMA-3-8B achieves state-of-the-art performance among LLMs under single-task settings, attaining accuracy comparable to—but slightly below—the best traditional models; GPT-3.5 performs significantly worse. Open-source LLMs demonstrate superior tunability, yet general-purpose LLMs still struggle to capture fine-grained chemical structural details critical for precise thermal property prediction.
📝 Abstract
Machine learning has revolutionized polymer science by enabling rapid property prediction and generative design. Large language models (LLMs) offer further opportunities in polymer informatics by simplifying workflows that traditionally rely on large labeled datasets, handcrafted representations, and complex feature engineering. LLMs leverage natural language inputs through transfer learning, eliminating the need for explicit fingerprinting and streamlining training. In this study, we finetune general purpose LLMs -- open-source LLaMA-3-8B and commercial GPT-3.5 -- on a curated dataset of 11,740 entries to predict key thermal properties: glass transition, melting, and decomposition temperatures. Using parameter-efficient fine-tuning and hyperparameter optimization, we benchmark these models against traditional fingerprinting-based approaches -- Polymer Genome, polyGNN, and polyBERT -- under single-task (ST) and multi-task (MT) learning. We find that while LLM-based methods approach traditional models in performance, they generally underperform in predictive accuracy and efficiency. LLaMA-3 consistently outperforms GPT-3.5, likely due to its tunable open-source architecture. Additionally, ST learning proves more effective than MT, as LLMs struggle to capture cross-property correlations, a key strength of traditional methods. Analysis of molecular embeddings reveals limitations of general purpose LLMs in representing nuanced chemo-structural information compared to handcrafted features and domain-specific embeddings. These findings provide insight into the interplay between molecular embeddings and natural language processing, guiding LLM selection for polymer informatics.