🤖 AI Summary
To address data silos and privacy constraints in AI-driven drug discovery, this work proposes the first distributed molecular generation framework integrating discrete denoising diffusion models with federated learning (implemented atop OpenFL). The method jointly leverages graph neural networks and SMILES sequence modeling to enable cross-institutional collaborative training without sharing raw molecular data. Crucially, it innovatively embeds a discrete diffusion process into the federated learning paradigm, thereby reconciling strict data locality requirements with high model efficacy. Extensive experiments demonstrate that the generated molecules achieve 98.7% validity and >99.2% uniqueness—performance on par with centralized training baselines—while preserving institutional data privacy. This advances practical, privacy-compliant AI for molecular design, overcoming a critical bottleneck in real-world pharmaceutical applications.
📝 Abstract
Generating unique molecules with biochemically desired properties to serve as viable drug candidates is a difficult task that requires specialized domain expertise. In recent years, diffusion models have shown promising results in accelerating the drug design process through AI-driven molecular generation. However, training these models requires massive amounts of data, which are often isolated in proprietary silos. OpenFL is a federated learning framework that enables privacy-preserving collaborative training across these decentralized data sites. In this work, we present a federated discrete denoising diffusion model that was trained using OpenFL. The federated model achieves comparable performance with a model trained on centralized data when evaluating the uniqueness and validity of the generated molecules. This demonstrates the utility of federated learning in the drug design process. OpenFL is available at: https://github.com/securefederatedai/openfl