Language Native Lightly Structured Databases for Large Language Model Driven Composite Materials Research

📅 2025-09-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional materials research relies on unstructured textual narratives, hindering systematic database construction and impeding machine learning applications. Method: This work introduces a “language-native lightweight-structured database” paradigm, specifically for boron nitride nanosheet (BNNS)-polymer thermal-conductive composites. It systematically integrates heterogeneous textual fragments—from synthesis protocols and characterization reports to theoretical calculations and mechanistic reasoning—while preserving scientific narrative integrity. The database supports composite queries combining semantic search, keyword matching, and numerical filtering. Contribution/Results: Serving as a high-fidelity knowledge foundation, it significantly enhances retrieval-augmented generation (RAG) and tool-augmented agents’ joint reasoning and retrieval capabilities in materials discovery. It enables traceable, verifiable standard operating procedure (SOP) generation, advancing large language models in materials R&D from broad “skim-reading” to precise, context-aware “deep interpretation.”

Technology Category

Application Category

📝 Abstract
Chemical and materials research has traditionally relied heavily on knowledge narrative, with progress often driven by language-based descriptions of principles, mechanisms, and experimental experiences, rather than tables, limiting what conventional databases and ML can exploit. We present a language-native database for boron nitride nanosheet (BNNS) polymer thermally conductive composites that captures lightly structured information from papers across preparation, characterization, theory-computation, and mechanistic reasoning, with evidence-linked snippets. Records are organized in a heterogeneous database and queried via composite retrieval with semantics, key words and value filters. The system can synthesizes literature into accurate, verifiable, and expert style guidance. This substrate enables high fidelity efficient Retrieval Augmented Generation (RAG) and tool augmented agents to interleave retrieval with reasoning and deliver actionable SOP. The framework supplies the language rich foundation required for LLM-driven materials discovery.
Problem

Research questions and friction points this paper is trying to address.

Capturing lightly structured information from materials research papers
Enabling high fidelity efficient Retrieval Augmented Generation for composites
Providing language-rich foundation for LLM-driven materials discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-native database with evidence-linked snippets
Composite retrieval using semantics and filters
Enables Retrieval Augmented Generation for materials discovery
🔎 Similar Papers
No similar papers found.
Y
Yuze Liu
School of Science, Tianjin University, Tianjin 300072, China
Z
Zhaoyuan Zhang
Tianjin Language Intelligence Technology Co., Ltd., Haihe Education Park, Jinnan District, Tianjin City, Postal Code: 300350
X
Xiangsheng Zeng
Shanghai boron moment new material Technology Co., Ltd., No. 3938 Yunchuan Road, Baoshan District, Shanghai City, Postal Code: 200949
Yihe Zhang
Yihe Zhang
Research Scientist, University of Louisiana at Lafayette
AI SecuritySocial Network Security
L
Leping Yu
Shanghai boron moment new material Technology Co., Ltd., No. 3938 Yunchuan Road, Baoshan District, Shanghai City, Postal Code: 200949
L
Lejia Wang
Shanghai boron moment new material Technology Co., Ltd., No. 3938 Yunchuan Road, Baoshan District, Shanghai City, Postal Code: 200949
X
Xi Yu
School of Science, Tianjin University, Tianjin 300072, China