Open Polymer Challenge: Post-Competition Report

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current polymer AI research is hindered by the scarcity of high-quality, open-source datasets. To address this, we introduce PolyBench—the first open-source polymer informatics benchmark tailored for sustainable polymeric materials—comprising 10,000 polymers annotated with five critical physicochemical properties, including thermal conductivity, and explicitly designed for few-shot, multi-task, and cross-source heterogeneous prediction scenarios. Methodologically, we propose polymer-specific feature enhancement, self-supervised pretraining, and targeted ensemble modeling, complemented by the ADEPT framework to generate molecular dynamics simulation data and mitigate distribution shift. We release a standardized test suite alongside reproducible, multi-model baselines. Our approach significantly improves property prediction accuracy, establishes methodological standards for polymer AI modeling, and accelerates virtual screening of energy-efficient polymeric materials.

Technology Category

Application Category

📝 Abstract
Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality, and openly accessible polymer datasets. The Open Polymer Challenge (OPC) addresses this gap by releasing the first community-developed benchmark for polymer informatics, featuring a dataset with 10K polymers and 5 properties: thermal conductivity, radius of gyration, density, fractional free volume, and glass transition temperature. The challenge centers on multi-task polymer property prediction, a core step in virtual screening pipelines for materials discovery. Participants developed models under realistic constraints that include small data, label imbalance, and heterogeneous simulation sources, using techniques such as feature-based augmentation, transfer learning, self-supervised pretraining, and targeted ensemble strategies. The competition also revealed important lessons about data preparation, distribution shifts, and cross-group simulation consistency, informing best practices for future large-scale polymer datasets. The resulting models, analysis, and released data create a new foundation for molecular AI in polymer science and are expected to accelerate the development of sustainable and energy-efficient materials. Along with the competition, we release the test dataset at https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data. We also release the data generation pipeline at https://github.com/sobinalosious/ADEPT, which simulates more than 25 properties, including thermal conductivity, radius of gyration, and density.
Problem

Research questions and friction points this paper is trying to address.

Addresses the lack of large, high-quality, open polymer datasets
Focuses on multi-task polymer property prediction for virtual screening
Develops models under realistic constraints like small data and imbalance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task polymer property prediction using machine learning models
Feature-based augmentation and transfer learning for small datasets
Self-supervised pretraining and targeted ensemble strategies applied
🔎 Similar Papers
No similar papers found.
G
Gang Liu
University of Notre Dame
S
Sobin Alosious
University of Notre Dame
S
Subhamoy Mahajan
University of Wisconsin–Madison
Eric Inae
Eric Inae
Graduate Student, University of Notre Dame
Y
Yihan Zhu
University of Notre Dame
Y
Yuhan Liu
University of Notre Dame
R
Renzheng Zhang
University of Notre Dame
Jiaxin Xu
Jiaxin Xu
University of Notre Dame
Material InformaticsMachine LearningXAI
A
Addison Howard
Kaggle
Y
Ying Li
University of Wisconsin–Madison
Tengfei Luo
Tengfei Luo
Dorini Family Professor, MÖNSTER (MOlecular/Nano-Sacle Transport & Energy Research) Lab
nanotechnologypolymerheat transfermass transferwater treatment
M
Meng Jiang
University of Notre Dame