CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Drug toxicity prediction faces challenges including heavy reliance on labeled data, poor interpretability, and difficulty in modeling organ-specific mechanisms. To address these, we propose CoTox—a novel framework that integrates biological context (e.g., signaling pathways, Gene Ontology terms) with molecular structural information (e.g., IUPAC names) into the chain-of-thought reasoning of large language models (e.g., GPT-4o), enabling transparent, physiologically consistent multi-organ toxicity prediction. CoTox eliminates dependence on large-scale annotated molecular datasets while significantly improving both predictive accuracy and interpretability. It outperforms conventional machine learning and deep learning baselines across diverse toxicity prediction tasks. Furthermore, its biological plausibility is validated through cell-type–specific response simulation. The implementation—including code and prompt templates—is publicly available.

Technology Category

Application Category

📝 Abstract
Drug toxicity remains a major challenge in pharmaceutical development. Recent machine learning models have improved in silico toxicity prediction, but their reliance on annotated data and lack of interpretability limit their applicability. This limits their ability to capture organ-specific toxicities driven by complex biological mechanisms. Large language models (LLMs) offer a promising alternative through step-by-step reasoning and integration of textual data, yet prior approaches lack biological context and transparent rationale. To address this issue, we propose CoTox, a novel framework that integrates LLM with chain-of-thought (CoT) reasoning for multi-toxicity prediction. CoTox combines chemical structure data, biological pathways, and gene ontology (GO) terms to generate interpretable toxicity predictions through step-by-step reasoning. Using GPT-4o, we show that CoTox outperforms both traditional machine learning and deep learning model. We further examine its performance across various LLMs to identify where CoTox is most effective. Additionally, we find that representing chemical structures with IUPAC names, which are easier for LLMs to understand than SMILES, enhances the model's reasoning ability and improves predictive performance. To demonstrate its practical utility in drug development, we simulate the treatment of relevant cell types with drug and incorporated the resulting biological context into the CoTox framework. This approach allow CoTox to generate toxicity predictions aligned with physiological responses, as shown in case study. This result highlights the potential of LLM-based frameworks to improve interpretability and support early-stage drug safety assessment. The code and prompt used in this work are available at https://github.com/dmis-lab/CoTox.
Problem

Research questions and friction points this paper is trying to address.

Predicting drug toxicity with interpretable reasoning
Integrating biological context for organ-specific toxicity
Enhancing LLM-based toxicity prediction using chemical data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLM with chain-of-thought reasoning
Combines chemical, biological, and gene data
Uses IUPAC names for better LLM understanding
🔎 Similar Papers
Jueon Park
Jueon Park
Korea University
AI DrugDiscovery
Yein Park
Yein Park
Korea University
NLPRAGKnowledge ConflictKnowledge Editing
M
Minju Song
Department of Computer Science and Engineering, Korea University, Seoul 17035, Republic of Korea
Soyon Park
Soyon Park
Korea University
D
Donghyeon Lee
Department of Computer Science and Engineering, Korea University, Seoul 17035, Republic of Korea; AIGEN Sciences, Seoul 04778, Republic of Korea
Seungheun Baek
Seungheun Baek
Korea university
AIDrug discoveryGNN
J
Jaewoo Kang
Department of Computer Science and Engineering, Korea University, Seoul 17035, Republic of Korea; AIGEN Sciences, Seoul 04778, Republic of Korea