Retrieval Augmented Generation of Literature-derived Polymer Knowledge: The Example of a Biodegradable Polymer Expert System

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This study addresses the challenges posed by inconsistent terminology and fragmented context in the vast body of unstructured literature on biodegradable polymers (PHA), which hinder systematic knowledge utilization. To overcome these limitations, the authors propose an expert system integrating large language models with retrieval-augmented generation (RAG), featuring dual pipelines—VectorRAG based on semantic embeddings and GraphRAG grounded in knowledge graphs—to enable context-preserving knowledge extraction and multi-hop reasoning across over 1,000 scientific papers. The work introduces a novel knowledge representation strategy that balances contextual fidelity with cross-study connectivity, yielding the first interpretable, traceable, and highly relevant question-answering system in polymer science. Experimental results demonstrate that GraphRAG achieves superior precision and interpretability compared to general-purpose large models, while VectorRAG offers higher recall. Expert validation confirms the system’s reliability, citation accuracy, and utility in efficient literature navigation and pattern discovery, offering a reusable RAG framework for materials science.

Technology Category

Application Category

📝 Abstract

Polymer literature contains a large and growing body of experimental knowledge, yet much of it is buried in unstructured text and inconsistent terminology, making systematic retrieval and reasoning difficult. Existing tools typically extract narrow, study-specific facts in isolation, failing to preserve the cross-study context required to answer broader scientific questions. Retrieval-augmented generation (RAG) offers a promising way to overcome this limitation by combining large language models (LLMs) with external retrieval, but its effectiveness depends strongly on how domain knowledge is represented. In this work, we develop two retrieval pipelines: a dense semantic vector-based approach (VectorRAG) and a graph-based approach (GraphRAG). Using over 1,000 polyhydroxyalkanoate (PHA) papers, we construct context-preserving paragraph embeddings and a canonicalized structured knowledge graph supporting entity disambiguation and multi-hop reasoning. We evaluate these pipelines through standard retrieval metrics, comparisons with general state-of-the-art systems such as GPT and Gemini, and qualitative validation by a domain chemist. The results show that GraphRAG achieves higher precision and interpretability, while VectorRAG provides broader recall, highlighting complementary trade-offs. Expert validation further confirms that the tailored pipelines, particularly GraphRAG, produce well-grounded, citation-reliable responses with strong domain relevance. By grounding every statement in evidence, these systems enable researchers to navigate the literature, compare findings across studies, and uncover patterns that are difficult to extract manually. More broadly, this work establishes a practical framework for building materials science assistants using curated corpora and retrieval design, reducing reliance on proprietary models while enabling trustworthy literature analysis at scale.

Problem

Research questions and friction points this paper is trying to address.

polymer literature

unstructured text

knowledge retrieval

cross-study context

scientific reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation

Knowledge Graph

Polymer Literature Mining