KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

📅 2024-10-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
DEL data scarcity severely hinders machine learning–driven drug discovery. To address this, we introduce KinDEL—the first large-scale, publicly available DEL dataset targeting two therapeutically relevant kinases (MAPK14 and DDR1)—featuring paired on-DNA screening and off-DNA biophysical validation data, thereby filling a critical gap. Methodologically, we integrate high-throughput DEL screening, deep sequencing, molecular graph neural networks, and SE(3)-equivariant structure-aware probabilistic modeling, all validated experimentally. Models trained on KinDEL achieve AUC > 0.89; crucially, on-DNA predictions correlate strongly with off-DNA binding affinities (Pearson *r* = 0.72), confirming both dataset quality and model generalizability. KinDEL and its associated computational framework advance DEL modeling from empirical heuristics toward structure- and mechanism-informed prediction, enabling more reliable hit identification and target engagement assessment.

Technology Category

Application Category

📝 Abstract
DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to process such data. To bridge this gap, we present KinDEL, one of the first large, publicly available DEL datasets on two kinases: Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1). Interest in this data modality is growing due to its ability to generate extensive supervised chemical data that densely samples around select molecular structures. Demonstrating one such application of the data, we benchmark different machine learning techniques to develop predictive models for hit identification; in particular, we highlight recent structure-based probabilistic approaches. Finally, we provide biophysical assay data, both on- and off-DNA, to validate our models on a smaller subset of molecules. Data and code for our benchmarks can be found at: https://github.com/insitro/kindel.
Problem

Research questions and friction points this paper is trying to address.

Lack of public DNA-Encoded Library datasets for kinase inhibitors
Need for large-scale DEL data with binding poses for ML
Limited resources for computational exploration of kinase-targeting compounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Largest public DEL dataset for kinases
Includes docking binding poses data
Combines 2D and 3D structure benchmarks
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
Postdoctoral Fellow – AI/ML Enabled Bioprocess Modeling and Control
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Andover
Benson Chen
Benson Chen
Insitro, South San Francisco, CA 94080, USA
Tomasz Danel
Tomasz Danel
Jagiellonian University, insitro
deep learningcomputer-aided drug designgenerative models
P
Patrick J. McEnaney
Insitro, South San Francisco, CA 94080, USA
Nikhil Jain
Nikhil Jain
Nvidia
Parallel Computing
K
Kirill Novikov
Insitro, South San Francisco, CA 94080, USA
S
Spurti U Akki
Insitro, South San Francisco, CA 94080, USA
J
Joshua L. Turnbull
Insitro, South San Francisco, CA 94080, USA
V
V. Pandya
Insitro, South San Francisco, CA 94080, USA
B
B. P. Belotserkovskii
Insitro, South San Francisco, CA 94080, USA
J
Jared Bryce Weaver
Insitro, South San Francisco, CA 94080, USA
A
Ankita Biswas
Insitro, South San Francisco, CA 94080, USA
Dat Nguyen
Dat Nguyen
Postdoc - Harvard, Basis Institute
Graph Neural NetworkProgram AnalysisSoftware EngineeringProgram SynthesisComputer Vision
G
Gabriel H. S. Dreiman
Insitro, South San Francisco, CA 94080, USA
M
Mohammad M. Sultan
Insitro, South San Francisco, CA 94080, USA
N
Nathaniel Stanley
Insitro, South San Francisco, CA 94080, USA
D
Daniel M Whalen
Insitro, South San Francisco, CA 94080, USA
D
Divya Kanichar
Insitro, South San Francisco, CA 94080, USA
C
Christoph Klein
Insitro, South San Francisco, CA 94080, USA
Emily Fox
Emily Fox
Department of Computer Science, The University of Texas at Dallas
Algorithmscomputational geometrycombinatorial optimization
R
R. E. Watts
Insitro, South San Francisco, CA 94080, USA