🤖 AI Summary
This work addresses catastrophic forgetting in token-level incremental learning—specifically, the forgetting of gene representations during continual learning of single-cell transcriptomic data—where genes serve as learnable “tokens.” We propose the **Gene-Incremental Learning (GIL) paradigm**, the first framework tailored to the unique characteristics of single-cell data. Methodologically, GIL integrates gene expression feature modeling, dynamic architecture expansion, and dedicated forgetting-mitigation strategies. We further introduce the first standardized GIL benchmark and evaluation protocol. Extensive experiments on multiple large-scale single-cell datasets demonstrate that GIL significantly alleviates gene-level forgetting while ensuring robustness and reproducibility. This work bridges a critical gap in token-level incremental learning within biomedicine and establishes foundational infrastructure and a novel paradigm for dynamic multimodal modeling of single-cell omics data.
📝 Abstract
Classes, as fundamental elements of Computer Vision, have been extensively studied within incremental learning frameworks. In contrast, tokens, which play essential roles in many research fields, exhibit similar characteristics of growth, yet investigations into their incremental learning remain significantly scarce. This research gap primarily stems from the holistic nature of tokens in language, which imposes significant challenges on the design of incremental learning frameworks for them. To overcome this obstacle, in this work, we turn to a type of token, gene, for a large-scale biological dataset--single-cell transcriptomics--to formulate a pipeline for gene incremental learning and establish corresponding evaluations. We found that the forgetting problem also exists in gene incremental learning, thus we adapted existing class incremental learning methods to mitigate the forgetting of genes. Through extensive experiments, we demonstrated the soundness of our framework design and evaluations, as well as the effectiveness of our method adaptations. Finally, we provide a complete benchmark for gene incremental learning in single-cell transcriptomics.