🤖 AI Summary
This study addresses the vulnerability of weakly supervised models under distribution shifts when ground-truth labels are unavailable and the supervision mechanism itself varies across environments—a phenomenon formalized here as “supervision drift.” Leveraging CRISPR-Cas13d experiments, the authors construct a non-i.i.d. RNA-seq benchmark spanning multiple cell lines and time points, using fixed weak labels to indirectly infer gRNA efficacy. They reveal that feature–label relationships undergo substantial changes across time, and propose feature stability as a simple diagnostic metric for model transferability. Empirical results show that models achieve moderate in-domain performance (R²=0.356, ρ=0.442) and partial generalization across cell lines (ρ≈0.40), but fail dramatically in cross-time prediction (e.g., XGBoost: R²=−0.155, ρ=0.056).
📝 Abstract
Learning from weak or proxy supervision is common when ground-truth labels are unavailable, yet robustness under distribution shift remains poorly understood, especially when the supervision mechanism itself changes. We formalize this as supervision drift, defined as changes in P(y | x, c) across contexts, and study it in CRISPR-Cas13d experiments where guide efficacy is inferred indirectly from RNA-seq responses. Using data from two human cell lines and multiple time points, we build a controlled non-IID benchmark with explicit domain and temporal shifts while keeping the weak-label construction fixed. Models achieve strong in-domain performance (ridge R^2 = 0.356, Spearman rho = 0.442) and partial cross-cell-line transfer (rho ~ 0.40). However, temporal transfer fails across all models, with negative R^2 and near-zero correlation (e.g., XGBoost R^2 = -0.155, rho = 0.056). Additional analyses confirm this pattern. Feature-label relationships remain stable across cell lines but change sharply over time, indicating that failures arise from supervision drift rather than model limitations. These findings highlight feature stability as a simple diagnostic for detecting non-transferability before deployment.