🤖 AI Summary
This work addresses the challenge of interpretable drug recommendation. We propose KEDRec-LM, a large language model (LLM) trained via knowledge distillation–driven instruction fine-tuning, and introduce expRxRec—the first publicly available, multi-source heterogeneous dataset integrating knowledge graphs, clinical trial records, and PubMed literature. Methodologically, we pioneer the integration of knowledge distillation with instruction tuning, jointly leveraging drug graph embeddings, clinical text encodings, and PubMed semantic alignment to jointly generate accurate drug recommendations and natural-language medical explanations. Experimental results demonstrate that KEDRec-LM significantly outperforms existing baselines in both recommendation accuracy and rationale plausibility. Both the expRxRec dataset and the KEDRec-LM model are fully open-sourced, establishing a new benchmark and practical toolkit for interpretable biomedical AI research.
📝 Abstract
Drug discovery is a critical task in biomedical natural language processing (NLP), yet explainable drug discovery remains underexplored. Meanwhile, large language models (LLMs) have shown remarkable abilities in natural language understanding and generation. Leveraging LLMs for explainable drug discovery has the potential to improve downstream tasks and real-world applications. In this study, we utilize open-source drug knowledge graphs, clinical trial data, and PubMed publications to construct a comprehensive dataset for the explainable drug discovery task, named extbf{expRxRec}. Furthermore, we introduce extbf{KEDRec-LM}, an instruction-tuned LLM which distills knowledge from rich medical knowledge corpus for drug recommendation and rationale generation. To encourage further research in this area, we will publicly releasefootnote{A copy is attached with this submission} both the dataset and KEDRec-LM.