Open Polymer Challenge: Post-Competition Report

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Current polymer AI research is hindered by the scarcity of high-quality, open-source datasets. To address this, we introduce PolyBench—the first open-source polymer informatics benchmark tailored for sustainable polymeric materials—comprising 10,000 polymers annotated with five critical physicochemical properties, including thermal conductivity, and explicitly designed for few-shot, multi-task, and cross-source heterogeneous prediction scenarios. Methodologically, we propose polymer-specific feature enhancement, self-supervised pretraining, and targeted ensemble modeling, complemented by the ADEPT framework to generate molecular dynamics simulation data and mitigate distribution shift. We release a standardized test suite alongside reproducible, multi-model baselines. Our approach significantly improves property prediction accuracy, establishes methodological standards for polymer AI modeling, and accelerates virtual screening of energy-efficient polymeric materials.

Technology Category

Application Category

📝 Abstract

Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality, and openly accessible polymer datasets. The Open Polymer Challenge (OPC) addresses this gap by releasing the first community-developed benchmark for polymer informatics, featuring a dataset with 10K polymers and 5 properties: thermal conductivity, radius of gyration, density, fractional free volume, and glass transition temperature. The challenge centers on multi-task polymer property prediction, a core step in virtual screening pipelines for materials discovery. Participants developed models under realistic constraints that include small data, label imbalance, and heterogeneous simulation sources, using techniques such as feature-based augmentation, transfer learning, self-supervised pretraining, and targeted ensemble strategies. The competition also revealed important lessons about data preparation, distribution shifts, and cross-group simulation consistency, informing best practices for future large-scale polymer datasets. The resulting models, analysis, and released data create a new foundation for molecular AI in polymer science and are expected to accelerate the development of sustainable and energy-efficient materials. Along with the competition, we release the test dataset at https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data. We also release the data generation pipeline at https://github.com/sobinalosious/ADEPT, which simulates more than 25 properties, including thermal conductivity, radius of gyration, and density.

Problem

Research questions and friction points this paper is trying to address.

Addresses the lack of large, high-quality, open polymer datasets

Focuses on multi-task polymer property prediction for virtual screening

Develops models under realistic constraints like small data and imbalance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task polymer property prediction using machine learning models

Feature-based augmentation and transfer learning for small datasets

Self-supervised pretraining and targeted ensemble strategies applied

🔎 Similar Papers

No similar papers found.