Task-Specific Sparse Feature Masks for Molecular Toxicity Prediction with Chemical Language Models

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
In drug discovery, accurate and interpretable molecular toxicity prediction remains challenging, as conventional black-box models lack verifiable structural rationales. To address this, we propose a multitask Transformer framework featuring a novel task-specific sparse attention masking mechanism, which jointly optimizes prediction and attribution by end-to-end identifying toxicity-relevant molecular fragments under L1 regularization. The architecture employs a shared chemical language encoder coupled with task-specific sparse attention modules, balancing generalization capability and structural interpretability. Evaluated on ClinTox, SIDER, and Tox21 benchmarks, our model consistently outperforms both single-task and standard multitask baselines. Moreover, it generates chemically intuitive, fragment-level attribution maps—providing reliable, mechanistically grounded insights for toxicity analysis and lead compound optimization.

Technology Category

Application Category

📝 Abstract
Reliable in silico molecular toxicity prediction is a cornerstone of modern drug discovery, offering a scalable alternative to experimental screening. However, the black-box nature of state-of-the-art models remains a significant barrier to adoption, as high-stakes safety decisions demand verifiable structural insights alongside predictive performance. To address this, we propose a novel multi-task learning (MTL) framework designed to jointly enhance accuracy and interpretability. Our architecture integrates a shared chemical language model with task-specific attention modules. By imposing an L1 sparsity penalty on these modules, the framework is constrained to focus on a minimal set of salient molecular fragments for each distinct toxicity endpoint. The resulting framework is trained end-to-end and is readily adaptable to various transformer-based backbones. Evaluated on the ClinTox, SIDER, and Tox21 benchmark datasets, our approach consistently outperforms both single-task and standard MTL baselines. Crucially, the sparse attention weights provide chemically intuitive visualizations that reveal the specific fragments influencing predictions, thereby enhancing insight into the model's decision-making process.
Problem

Research questions and friction points this paper is trying to address.

Develops a multi-task learning framework for molecular toxicity prediction
Enhances model interpretability via sparse attention on molecular fragments
Improves accuracy and provides chemically intuitive prediction visualizations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task learning with task-specific attention modules
L1 sparsity penalty for minimal salient molecular fragments
End-to-end training adaptable to transformer-based backbones
🔎 Similar Papers
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
K
Kwun Sy Lee
School of Science and Technology, Hong Kong Metropolitan University, Ho Man Tin, Hong Kong
J
Jiawei Chen
School of Science and Technology, Hong Kong Metropolitan University, Ho Man Tin, Hong Kong
F
Fuk Sheng Ford Chung
School of Science and Technology, Hong Kong Metropolitan University, Ho Man Tin, Hong Kong
T
Tianyu Zhao
Faculty of Engineering, Hong Kong Polytechnic University, Hung Hom, Hong Kong
Zhenyuan Chen
Zhenyuan Chen
Nankai University
Computer VisionVision-Language Models
D
Debby D. Wang
School of Science and Technology, Hong Kong Metropolitan University, Ho Man Tin, Hong Kong