Chemical knowledge-informed framework for privacy-aware retrosynthesis learning

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address privacy leakage risks of raw chemical reaction data in centralized retrosynthetic prediction, this work proposes a privacy-preserving distributed retrosynthesis learning framework. The framework adopts a federated learning paradigm wherein raw reaction data remain locally stored and are never shared across clients. It introduces a novel chemistry-driven model aggregation mechanism: client model contributions are quantified adaptively via interpretable reactant property metrics, enabling weighted aggregation grounded in chemical relevance. Furthermore, the framework integrates domain-informed chemical knowledge embedding with parameter-level differential privacy during local training. Evaluated on the USPTO-50K benchmark, our method achieves approximately 20% higher top-1 route prediction accuracy than standard FedAvg, demonstrating a substantial improvement in both predictive performance and privacy preservation.

Technology Category

Application Category

📝 Abstract
Chemical reaction data is a pivotal asset, driving advances in competitive fields such as pharmaceuticals, materials science, and industrial chemistry. Its proprietary nature renders it sensitive, as it often includes confidential insights and competitive advantages organizations strive to protect. However, in contrast to this need for confidentiality, the current standard training paradigm for machine learning-based retrosynthesis gathers reaction data from multiple sources into one single edge to train prediction models. This paradigm poses considerable privacy risks as it necessitates broad data availability across organizational boundaries and frequent data transmission between entities, potentially exposing proprietary information to unauthorized access or interception during storage and transfer. In the present study, we introduce the chemical knowledge-informed framework (CKIF), a privacy-preserving approach for learning retrosynthesis models. CKIF enables distributed training across multiple chemical organizations without compromising the confidentiality of proprietary reaction data. Instead of gathering raw reaction data, CKIF learns retrosynthesis models through iterative, chemical knowledge-informed aggregation of model parameters. In particular, the chemical properties of predicted reactants are leveraged to quantitatively assess the observable behaviors of individual models, which in turn determines the adaptive weights used for model aggregation. On a variety of reaction datasets, CKIF outperforms several strong baselines by a clear margin (e.g., ~20% performance improvement over FedAvg on USPTO-50K), showing its feasibility and superiority to stimulate further research on privacy-preserving retrosynthesis.
Problem

Research questions and friction points this paper is trying to address.

Privacy-preserving retrosynthesis learning
Distributed training across organizations
Chemical knowledge-informed model aggregation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed training without data sharing
Chemical knowledge-informed model aggregation
Privacy-preserving retrosynthesis learning framework
🔎 Similar Papers
Guikun Chen
Guikun Chen
Zhejiang University
Computer VisionArtificial IntelligenceAI4Science
X
Xu Zhang
College of Computer Science and Technology, Zhejiang University, Hangzhou, 310058, Zhejiang, China
Y
Yi Yang
College of Computer Science and Technology, Zhejiang University, Hangzhou, 310058, Zhejiang, China
Wenguan Wang
Wenguan Wang
Zhejiang University
Neural-Symbolic AIEmbodied AIAutonomous CarsComputer VisionArtificial Intelligence