Training distribution determines the ceiling of drug-blind cancer sensitivity prediction

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study addresses the longstanding challenge in blind testing of cancer drug sensitivity prediction, where accurate inference of effective therapeutics from tumor molecular profiles has remained limited. The authors identify that the performance bottleneck stems not from inadequate representation capacity but from evaluation metrics overly sensitive to inter-drug potency differences, with training distribution bias—not drug encoding—being the primary constraint. To overcome this, they propose a mechanism-of-action (MoA)-guided hierarchical training and response-matching strategy. Validated across four independent datasets, their approach demonstrates that cell-line features alone can surpass existing drug-encoding methods, significantly improving per-drug prediction correlation—particularly for kinase inhibitors and other targeted agents—and thereby breaking through previous performance ceilings.

📝 Abstract

Precision oncology requires predicting which drugs will suppress a specific tumor from its molecular profile, but drug-blind sensitivity prediction has plateaued despite increasingly complex drug representations. Here we show that this stagnation reflects a metric artifact rather than a representational bottleneck. The standard benchmark, global Pearson r, is dominated by between-drug potency differences that a trivial drug-mean predictor captures without any cell-specific learning. Per-drug Pearson r, which isolates within-drug cell ranking, reveals that no drug encoding improves over cell-only features across four independent datasets. A controlled experiment channeling mechanism-of-action identity as either a drug feature or a training-distribution constraint identifies the cause. Supplying MoA as a feature yields negligible benefit, whereas using it to stratify training raises per-drug r substantially for targeted kinase inhibitors, because pan-cancer co-training suppresses pathway-specific sensitivity signals. Mechanism-stratified training and response matching from pilot observations provide two deployable strategies that together recover the principal sources of predictive gain in drug-blind sensitivity prediction.

Problem

Research questions and friction points this paper is trying to address.

drug-blind prediction

cancer drug sensitivity

training distribution

mechanism of action

precision oncology

Innovation

Methods, ideas, or system contributions that make the work stand out.

drug-blind prediction

mechanism-of-action stratification

per-drug Pearson correlation