scMamba: A Scalable Foundation Model for Single-Cell Multi-Omics Integration Beyond Highly Variable Feature Selection

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing single-cell multi-omics integration methods often rely on highly variable feature selection and neglect genomic positional information, limiting biological interpretability and scalability. Method: We propose SC-SSM, a novel integration framework that explicitly preserves genomic locus information via a patch-based tokenization strategy—representing each cell as an ordered sequence of genomic segments—and integrates a state-space model (SSM) with a dual-encoder architecture. Contrastive learning with cosine-similarity regularization enables cross-modality representation alignment and disentanglement of biological signals. The design natively accommodates high-dimensional sparse data without pre-filtering. Contribution/Results: SC-SSM significantly improves performance in cell clustering, cell-type annotation, and trajectory inference across multiple real-world multi-omics datasets. It scales to million-cell atlases and establishes a principled, interpretable, and scalable paradigm for locus-aware single-cell multi-omics analysis.

Technology Category

Application Category

📝 Abstract

The advent of single-cell multi-omics technologies has enabled the simultaneous profiling of diverse omics layers within individual cells. Integrating such multimodal data provides unprecedented insights into cellular identity, regulatory processes, and disease mechanisms. However, it remains challenging, as current methods often rely on selecting highly variable genes or peaks during preprocessing, which may inadvertently discard crucial biological information. Here, we present scMamba, a foundation model designed to integrate single-cell multi-omics data without the need for prior feature selection while preserving genomic positional information. scMamba introduces a patch-based cell tokenization strategy that treats genomics regions as words (tokens) and cells as sentences. Building upon the concept of state space duality, scMamba distills rich biological insights from high-dimensional, sparse single-cell multi-omics data. Additionally, our novel contrastive learning approach, enhanced with cosine similarity regularization, enables superior alignment across omics layers compared to traditional methods. Systematic benchmarking across multiple datasets demonstrates that scMamba significantly outperforms state-of-the-art methods in preserving biological variation, aligning omics layers, and enhancing key downstream tasks such as clustering, cell type annotation, and trajectory inference. Our findings position scMamba as a powerful tool for large-scale single-cell multi-omics integration, capable of handling large-scale atlases and advancing biological discovery.

Problem

Research questions and friction points this paper is trying to address.

Integrating single-cell multi-omics data without feature selection

Preserving genomic positional information in data integration

Improving alignment of omics layers for downstream tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Patch-based cell tokenization strategy

State space duality for biological insights

Contrastive learning with cosine regularization

🔎 Similar Papers

No similar papers found.