scDrugMap: Benchmarking Large Foundation Models for Drug Response Prediction

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of cancer drug resistance, this work introduces the first large-model benchmark framework for drug response prediction from single-cell data. Methodologically, it systematically integrates eight foundational single-cell models (e.g., scFoundation, scGPT, UCE) and two large language models (LLaMA, ChatGLM), proposing a dual-path adaptation strategy combining layer freezing and LoRA-based fine-tuning to enable zero-shot transfer and cross-dataset generalization. Comprehensive evaluation is conducted across 36 datasets comprising over 340,000 cells, covering both pooling-based and cross-dataset prediction paradigms. Results show that scFoundation achieves an F1 score of 0.971 under pooling, UCE attains 0.774 in cross-dataset fine-tuning, and scGPT reaches 0.858 in zero-shot inference; the best-performing model outperforms the worst by over 50%. The framework is open-sourced with CLI and web-based interfaces, advancing precision pharmacological modeling at single-cell resolution.

Technology Category

Application Category

📝 Abstract
Drug resistance presents a major challenge in cancer therapy. Single cell profiling offers insights into cellular heterogeneity, yet the application of large-scale foundation models for predicting drug response in single cell data remains underexplored. To address this, we developed scDrugMap, an integrated framework featuring both a Python command-line interface and a web server for drug response prediction. scDrugMap evaluates a wide range of foundation models, including eight single-cell models and two large language models, using a curated dataset of over 326,000 cells in the primary collection and 18,800 cells in the validation set, spanning 36 datasets and diverse tissue and cancer types. We benchmarked model performance under pooled-data and cross-data evaluation settings, employing both layer freezing and Low-Rank Adaptation (LoRA) fine-tuning strategies. In the pooled-data scenario, scFoundation achieved the best performance, with mean F1 scores of 0.971 (layer freezing) and 0.947 (fine-tuning), outperforming the lowest-performing model by over 50%. In the cross-data setting, UCE excelled post fine-tuning (mean F1: 0.774), while scGPT led in zero-shot learning (mean F1: 0.858). Overall, scDrugMap provides the first large-scale benchmark of foundation models for drug response prediction in single-cell data and serves as a user-friendly, flexible platform for advancing drug discovery and translational research.
Problem

Research questions and friction points this paper is trying to address.

Evaluating foundation models for drug response prediction in single-cell data
Addressing drug resistance challenges in cancer therapy using large-scale models
Providing a benchmark and platform for drug discovery with diverse datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated framework with CLI and web server
Evaluates 10 foundation models on 326K cells
Employs layer freezing and LoRA fine-tuning
Q
Qing Wang
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32611, USA
Y
Yining Pan
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32611, USA
M
Minghao Zhou
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32611, USA
Z
Zijia Tang
Trinity College, Duke University, Durham, NC, USA
Y
Yanfei Wang
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32611, USA
Guangyu Wang
Guangyu Wang
Houston Methodist
BioinformaticsComputational biologyAIepigenetics
Qianqian Song
Qianqian Song
Assistant Professor, University of Florida
Translational BioinformaticsBiomedical InformaticsArtificial Intelligence