Omics-scale polymer computational database transferable to real-world artificial intelligence applications

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Polymer science has long suffered from a scarcity of large-scale, open-access data, hindering AI-driven innovation. Method: We introduce PolyOmics—the largest publicly available polymer molecular dynamics simulation database to date (>100,000 polymers)—generated via a fully automated high-throughput simulation pipeline and leveraged within a pretrain-fine-tune machine learning framework. We propose and empirically validate a “simulation-to-reality” transfer learning paradigm for polymer property prediction. Contribution/Results: Systematic experiments reveal a power-law scaling relationship between database size and model generalization performance, providing empirical support for data-driven scientific discovery. PolyOmics significantly improves prediction accuracy under low-data regimes, enabling robust property estimation with limited experimental samples. This advancement bridges the gap between academic AI research and industrial polymer development, facilitating rapid, data-informed materials design and accelerating translation into real-world applications.

Technology Category

Application Category

📝 Abstract
Developing large-scale foundational datasets is a critical milestone in advancing artificial intelligence (AI)-driven scientific innovation. However, unlike AI-mature fields such as natural language processing, materials science, particularly polymer research, has significantly lagged in developing extensive open datasets. This lag is primarily due to the high costs of polymer synthesis and property measurements, along with the vastness and complexity of the chemical space. This study presents PolyOmics, an omics-scale computational database generated through fully automated molecular dynamics simulation pipelines that provide diverse physical properties for over $10^5$ polymeric materials. The PolyOmics database is collaboratively developed by approximately 260 researchers from 48 institutions to bridge the gap between academia and industry. Machine learning models pretrained on PolyOmics can be efficiently fine-tuned for a wide range of real-world downstream tasks, even when only limited experimental data are available. Notably, the generalisation capability of these simulation-to-real transfer models improve significantly as the size of the PolyOmics database increases, exhibiting power-law scaling. The emergence of scaling laws supports the "more is better" principle, highlighting the significance of ultralarge-scale computational materials data for improving real-world prediction performance. This unprecedented omics-scale database reveals vast unexplored regions of polymer materials, providing a foundation for AI-driven polymer science.
Problem

Research questions and friction points this paper is trying to address.

Developing large-scale polymer datasets for AI applications in materials science
Addressing high costs and complexity of polymer synthesis and property measurements
Bridging the gap between computational simulations and real-world experimental data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated molecular dynamics simulation pipelines generate database
PolyOmics database enables simulation-to-real transfer learning
Scaling laws improve generalization with larger database size
🔎 Similar Papers
No similar papers found.
Ryo Yoshida
Ryo Yoshida
The University of Tokyo
Natural Language ProcessingComputational Linguistics
Y
Yoshihiro Hayashi
The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan
H
Hidemine Furuya
School of Materials and Chemical Technology, Institute of Science Tokyo, Meguro-ku, Tokyo 152-8550, Japan
R
Ryohei Hosoya
School of Materials and Chemical Technology, Institute of Science Tokyo, Meguro-ku, Tokyo 152-8550, Japan
K
Kazuyoshi Kaneko
Research & Advanced Development Division, The Yokohama Rubber Co., Ltd., Hiratsuka, Kanagawa, 254-8601, Japan
H
Hiroki Sugisawa
Science & Innovation Center, Mitsubishi Chemical Corporation, Yokohama 227-8502, Japan
Y
Yu Kaneko
Business Development Center, R&D Headquarters, Daicel Corporation, Himeji, Hyogo 671-1283, Japan
A
Aiko Takahashi
The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan
Y
Yoh Noguchi
The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan
S
Shun Nanjo
Graduate Institute for Advanced Studies, SOKENDAI, Tachikawa, Tokyo 190-8562, Japan
K
Keiko Shinoda
The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan
T
Tomu Hamakawa
The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan
M
Mitsuru Ohno
Business Development Center, R&D Headquarters, Daicel Corporation, Himeji, Hyogo 671-1283, Japan
T
Takuya Kitamura
Imaging & Informatics Laboratories, FUJIFILM Corporation, Ashigarakami-gun, Kanagawa 258-8577, Japan
M
Misaki Yonekawa
Imaging & Informatics Laboratories, FUJIFILM Corporation, Ashigarakami-gun, Kanagawa 258-8577, Japan
Stephen Wu
Stephen Wu
Hamilton College
EconomicsWell-BeingHigher EducationHealth EconomicsBehavioral Economics
M
Masato Ohnishi
The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan
C
Chang Liu
The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan
T
Teruki Tsurimoto
R&D Center, Corporate, Sekisui Chemical Co., Ltd., Mishima-gun, Osaka 618-0021, Japan
A
Arifin
Materials Informatics Initiative, JSR Corporation, Kawasaki, Kanagawa 210-0821, Japan
A
Araki Wakiuchi
Materials Informatics Initiative, JSR Corporation, Kawasaki, Kanagawa 210-0821, Japan
K
Kohei Noda
Materials Informatics Initiative, JSR Corporation, Kawasaki, Kanagawa 210-0821, Japan
J
Junko Morikawa
School of Materials and Chemical Technology, Institute of Science Tokyo, Meguro-ku, Tokyo 152-8550, Japan
T
Teruaki Hayakawa
School of Materials and Chemical Technology, Institute of Science Tokyo, Meguro-ku, Tokyo 152-8550, Japan
J
Junichiro Shiomi
The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tachikawa, Tokyo 190-8562, Japan