🤖 AI Summary
Antimicrobial resistance (AMR) is escalating, while the development of novel antibiotics lags critically—necessitating generalizable computational methods for discovering potent antimicrobials against data-scarce, emerging pathogens. To address this, we present the first AI framework that integrates pathogen-specific contextual information: it constructs strain embeddings from dual-source representations (genomic sequences and scientific literature) and couples them with a discrete diffusion language model for molecular representation and de novo generation. Within a unified architecture, the framework enables both activity prediction and generative design against previously unseen pathogens—bypassing reliance on historical antimicrobial activity data. Empirical evaluation demonstrates substantial improvements over state-of-the-art models across multiple bacterial species and chemical spaces. It successfully generates highly active, structurally novel compounds—absent from natural product databases—with validated efficacy against drug-resistant strains. This work establishes a scalable, context-aware computational paradigm for accelerating anti-AMR drug discovery.
📝 Abstract
Antimicrobial resistance (AMR) is escalating and outpacing current antibiotic development. Thus, discovering antibiotics effective against emerging pathogens is becoming increasingly critical. However, existing approaches cannot rapidly identify effective molecules against novel pathogens or emerging drug-resistant strains. Here, we introduce ApexOracle, an artificial intelligence (AI) model that both predicts the antibacterial potency of existing compounds and designs de novo molecules active against strains it has never encountered. Departing from models that rely solely on molecular features, ApexOracle incorporates pathogen-specific context through the integration of molecular features captured via a foundational discrete diffusion language model and a dual-embedding framework that combines genomic- and literature-derived strain representations. Across diverse bacterial species and chemical modalities, ApexOracle consistently outperformed state-of-the-art approaches in activity prediction and demonstrated reliable transferability to novel pathogens with little or no antimicrobial data. Its unified representation-generation architecture further enables the in silico creation of "new-to-nature" molecules with high predicted efficacy against priority threats. By pairing rapid activity prediction with targeted molecular generation, ApexOracle offers a scalable strategy for countering AMR and preparing for future infectious-disease outbreaks.