Task-Specific Sparse Feature Masks for Molecular Toxicity Prediction with Chemical Language Models

๐Ÿ“… 2025-12-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In drug discovery, accurate and interpretable molecular toxicity prediction remains challenging, as conventional black-box models lack verifiable structural rationales. To address this, we propose a multitask Transformer framework featuring a novel task-specific sparse attention masking mechanism, which jointly optimizes prediction and attribution by end-to-end identifying toxicity-relevant molecular fragments under L1 regularization. The architecture employs a shared chemical language encoder coupled with task-specific sparse attention modules, balancing generalization capability and structural interpretability. Evaluated on ClinTox, SIDER, and Tox21 benchmarks, our model consistently outperforms both single-task and standard multitask baselines. Moreover, it generates chemically intuitive, fragment-level attribution mapsโ€”providing reliable, mechanistically grounded insights for toxicity analysis and lead compound optimization.

Technology Category

Application Category

๐Ÿ“ Abstract
Reliable in silico molecular toxicity prediction is a cornerstone of modern drug discovery, offering a scalable alternative to experimental screening. However, the black-box nature of state-of-the-art models remains a significant barrier to adoption, as high-stakes safety decisions demand verifiable structural insights alongside predictive performance. To address this, we propose a novel multi-task learning (MTL) framework designed to jointly enhance accuracy and interpretability. Our architecture integrates a shared chemical language model with task-specific attention modules. By imposing an L1 sparsity penalty on these modules, the framework is constrained to focus on a minimal set of salient molecular fragments for each distinct toxicity endpoint. The resulting framework is trained end-to-end and is readily adaptable to various transformer-based backbones. Evaluated on the ClinTox, SIDER, and Tox21 benchmark datasets, our approach consistently outperforms both single-task and standard MTL baselines. Crucially, the sparse attention weights provide chemically intuitive visualizations that reveal the specific fragments influencing predictions, thereby enhancing insight into the model's decision-making process.
Problem

Research questions and friction points this paper is trying to address.

Develops a multi-task learning framework for molecular toxicity prediction
Enhances model interpretability via sparse attention on molecular fragments
Improves accuracy and provides chemically intuitive prediction visualizations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task learning with task-specific attention modules
L1 sparsity penalty for minimal salient molecular fragments
End-to-end training adaptable to transformer-based backbones
๐Ÿ”Ž Similar Papers
K
Kwun Sy Lee
School of Science and Technology, Hong Kong Metropolitan University, Ho Man Tin, Hong Kong
J
Jiawei Chen
School of Science and Technology, Hong Kong Metropolitan University, Ho Man Tin, Hong Kong
F
Fuk Sheng Ford Chung
School of Science and Technology, Hong Kong Metropolitan University, Ho Man Tin, Hong Kong
T
Tianyu Zhao
Faculty of Engineering, Hong Kong Polytechnic University, Hung Hom, Hong Kong
Zhenyuan Chen
Zhenyuan Chen
Nankai University
Computer VisionVision-Language Models
D
Debby D. Wang
School of Science and Technology, Hong Kong Metropolitan University, Ho Man Tin, Hong Kong