๐ค AI Summary
This work addresses the challenges of integrating multi-omics data for synthetic lethality (SL) prediction and the limited performance of existing methods in both pan-cancer and single-cancer settings, particularly the issue of โmodality inertiaโ caused by divergent convergence rates across data modalities. To overcome these limitations, the authors propose SynLeaF, a novel framework employing a two-stage training strategy. In the first stage, adaptive single-modality teacher models extract features from gene expression, mutation, methylation, and copy number variation data. The second stage integrates these features via a cross-modal encoder that combines feature-level knowledge distillation with a mixture-of-experts mechanism, while a relational graph convolutional network incorporates gene interaction information from a knowledge graph. Evaluated across 19 experiments on eight cancer types and pan-cancer datasets, SynLeaF outperforms current methods on 17 metrics, with ablation and gradient analyses confirming its enhanced robustness and generalization capability.
๐ Abstract
Accurate prediction of synthetic lethality (SL) is important for guiding the development of cancer drugs and therapies. SL prediction faces significant challenges in the effective fusion of heterogeneous multi-source data. Existing multimodal methods often suffer from "modality laziness" due to disparate convergence speeds, which hinders the exploitation of complementary information. This is also one reason why most existing SL prediction models cannot perform well on both pan-cancer and single-cancer SL pair prediction. In this study, we propose SynLeaF, a dual-stage multimodal fusion framework for SL prediction across pan- and single-cancer contexts. The framework employs a VAE-based cross-encoder with a product of experts mechanism to fuse four omics data types (gene expression, mutation, methylation, and CNV), while simultaneously utilizing a relational graph convolutional network to capture structured gene representations from biomedical knowledge graphs. To mitigate modality laziness, SynLeaF introduces a dual-stage training mechanism employing featurelevel knowledge distillation with adaptive uni-modal teacher and ensemble strategies. In extensive experiments across eight specific cancer types and a pancancer dataset, SynLeaF achieves superior performance in 17 out of 19 scenarios. Ablation studies and gradient analyses further validate the critical contributions of the proposed fusion and distillation mechanisms to model robustness and generalization. To facilitate community use, a web server is available at https://synleaf.bioinformatics-lilab.cn.