🤖 AI Summary
Organic photovoltaic (OPV) material discovery has long been hindered by the difficulty of co-designing donor/acceptor molecular pairs; existing approaches typically optimize individual components in isolation and lack a unified modeling framework. Method: We propose a dual-path machine learning paradigm, introducing OPV2D—the largest experimentally validated OPV dataset to date—and integrating hierarchical graph neural networks, multi-task prediction of optoelectronic properties, molecular orbital energy estimation, and a reinforcement-learning–guided MatGPT generative model for joint donor–acceptor pair generation and power conversion efficiency (PCE)-driven closed-loop optimization. Contribution/Results: Our framework significantly improves both predictive accuracy and synthetic feasibility for high-PCE (>18%) materials, establishing the first end-to-end, scalable pipeline for accelerated OPV material discovery.
📝 Abstract
Organic photovoltaic (OPV) materials offer a promising path toward sustainable energy generation, but their development is limited by the difficulty of identifying high performance donor and acceptor pairs with strong power conversion efficiencies (PCEs). Existing design strategies typically focus on either the donor or the acceptor alone, rather than using a unified approach capable of modeling both components. In this work, we introduce a dual machine learning framework for OPV discovery that combines predictive modeling with generative molecular design. We present the Organic Photovoltaic Donor Acceptor Dataset (OPV2D), the largest curated dataset of its kind, containing 2000 experimentally characterized donor acceptor pairs. Using this dataset, we develop the Organic Photovoltaic Classifier (OPVC) to predict whether a material exhibits OPV behavior, and a hierarchical graph neural network that incorporates multi task learning and donor acceptor interaction modeling. This framework includes the Molecular Orbital Energy Estimator (MOE2) for predicting HOMO and LUMO energy levels, and the Photovoltaic Performance Predictor (P3) for estimating PCE. In addition, we introduce the Material Generative Pretrained Transformer (MatGPT) to produce synthetically accessible organic semiconductors, guided by a reinforcement learning strategy with three objective policy optimization. By linking molecular representation learning with performance prediction, our framework advances data driven discovery of high performance OPV materials.