Task-Specific Sparse Feature Masks for Molecular Toxicity Prediction with Chemical Language Models

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

In drug discovery, accurate and interpretable molecular toxicity prediction remains challenging, as conventional black-box models lack verifiable structural rationales. To address this, we propose a multitask Transformer framework featuring a novel task-specific sparse attention masking mechanism, which jointly optimizes prediction and attribution by end-to-end identifying toxicity-relevant molecular fragments under L1 regularization. The architecture employs a shared chemical language encoder coupled with task-specific sparse attention modules, balancing generalization capability and structural interpretability. Evaluated on ClinTox, SIDER, and Tox21 benchmarks, our model consistently outperforms both single-task and standard multitask baselines. Moreover, it generates chemically intuitive, fragment-level attribution maps—providing reliable, mechanistically grounded insights for toxicity analysis and lead compound optimization.

Technology Category

Application Category

📝 Abstract

Reliable in silico molecular toxicity prediction is a cornerstone of modern drug discovery, offering a scalable alternative to experimental screening. However, the black-box nature of state-of-the-art models remains a significant barrier to adoption, as high-stakes safety decisions demand verifiable structural insights alongside predictive performance. To address this, we propose a novel multi-task learning (MTL) framework designed to jointly enhance accuracy and interpretability. Our architecture integrates a shared chemical language model with task-specific attention modules. By imposing an L1 sparsity penalty on these modules, the framework is constrained to focus on a minimal set of salient molecular fragments for each distinct toxicity endpoint. The resulting framework is trained end-to-end and is readily adaptable to various transformer-based backbones. Evaluated on the ClinTox, SIDER, and Tox21 benchmark datasets, our approach consistently outperforms both single-task and standard MTL baselines. Crucially, the sparse attention weights provide chemically intuitive visualizations that reveal the specific fragments influencing predictions, thereby enhancing insight into the model's decision-making process.

Problem

Research questions and friction points this paper is trying to address.

Develops a multi-task learning framework for molecular toxicity prediction

Enhances model interpretability via sparse attention on molecular fragments

Improves accuracy and provides chemically intuitive prediction visualizations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task learning with task-specific attention modules

L1 sparsity penalty for minimal salient molecular fragments

End-to-end training adaptable to transformer-based backbones

🔎 Similar Papers

FARM: Functional Group-Aware Representations for Small Molecules