Teaching and Evaluating LLMs to Reason About Polymer Design Related Tasks

📅 2026-01-22

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the lack of domain-specific expertise and inadequate alignment of current large language models (LLMs) in polymer design. To bridge this gap, the authors introduce PolyBench, a novel benchmark comprising 125,000 structured reasoning tasks, enriched with over 13 million experimental and synthesis data points. They propose a knowledge-enhanced reasoning distillation approach that fine-tunes smaller language models (7B–14B parameters) through hierarchical task organization and chain-of-thought (CoT) distillation. This method substantially improves model alignment and generalization on specialized scientific tasks, outperforming open-source models of comparable scale on PolyBench and even surpassing state-of-the-art closed-source LLMs. Consistent performance gains are also demonstrated across other polymer-related benchmarks, highlighting the effectiveness and transferability of the proposed framework.

Technology Category

Application Category

📝 Abstract

Research in AI4Science has shown promise in many science applications, including polymer design. However, current LLMs prove ineffective on this problem space because: (i) most models lack polymer-specific knowledge (ii) existing aligned models lack coverage of knowledge and capabilities relevant to polymer design. Addressing this, we introduce PolyBench, a large scale training and test benchmark dataset of more than 125K polymer design related tasks, leveraging a knowledge base of 13M+ data points obtained from experimental and synthetic sources to ensure broad coverage of polymers and their properties. For effective alignment using PolyBench, we introduce a knowledge-augmented reasoning distillation method that augments this dataset with structured CoT. Furthermore, tasks in PolyBench are organized from simple to complex analytical reasoning problems, enabling generalization tests and diagnostic probes across the problem space. Experiments show that small language models (SLMs), of 7B to 14B parameters, trained on PolyBench data outperform similar sized models, and even closed source frontier LLMs on PolyBench test dataset while demonstrating gains on other polymer benchmarks as well.

Problem

Research questions and friction points this paper is trying to address.

polymer design

large language models

AI4Science

domain-specific knowledge

reasoning tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

PolyBench

knowledge-augmented reasoning distillation

chain-of-thought (CoT)