🤖 AI Summary
Designing high-performance donor–acceptor (D–A) molecular pairs for organic photovoltaics (OPVs) remains challenging due to the vast chemical space and complex structure–property relationships.
Method: This work introduces the first integrated framework combining large-scale graph neural network (GNN) pretraining with GPT-2–driven reinforcement learning (RL) for de novo molecular generation. We propose a novel synergistic paradigm integrating GNN-based representation learning and generative RL, augmented by interpretable fragment-level structural attribution analysis.
Contribution/Results: We release the largest open-source OPV dataset to date—comprising nearly 3,000 experimentally validated D–A pairs—and generate molecules predicted to achieve power conversion efficiencies (PCEs) of ~21%. The framework yields synthetically feasible, high-performance candidates alongside actionable design principles, significantly accelerating molecular discovery and enabling closed-loop, AI-driven OPV materials development and experimental validation.
📝 Abstract
Organic photovoltaic (OPV) materials offer a promising avenue toward cost-effective solar energy utilization. However, optimizing donor-acceptor (D-A) combinations to achieve high power conversion efficiency (PCE) remains a significant challenge. In this work, we propose a framework that integrates large-scale pretraining of graph neural networks (GNNs) with a GPT-2 (Generative Pretrained Transformer 2)-based reinforcement learning (RL) strategy to design OPV molecules with potentially high PCE. This approach produces candidate molecules with predicted efficiencies approaching 21%, although further experimental validation is required. Moreover, we conducted a preliminary fragment-level analysis to identify structural motifs recognized by the RL model that may contribute to enhanced PCE, thus providing design guidelines for the broader research community. To facilitate continued discovery, we are building the largest open-source OPV dataset to date, expected to include nearly 3,000 donor-acceptor pairs. Finally, we discuss plans to collaborate with experimental teams on synthesizing and characterizing AI-designed molecules, which will provide new data to refine and improve our predictive and generative models.