🤖 AI Summary
Single-cell proteomic data are challenging to integrate due to the fragmented nature of targeted antibody panels. To address this, this work proposes scpFormer, a Transformer-based foundation model that maps variable antibody panels into a unified semantic space through continuous sequence anchoring embeddings, evolutionary-scale modeling, and expression-aware encoding. Leveraging panel-agnostic continuous tokenization and an open-vocabulary architecture, scpFormer enables unsupervised clustering, cross-batch integration, and in silico panel expansion, while effectively reconstructing biological manifolds even from sparse clinical samples. Experimental results demonstrate that scpFormer excels in large-scale integration tasks and successfully facilitates precision oncology applications, including cancer drug response prediction.
📝 Abstract
The integration of single-cell proteomic data is often hindered by the fragmented nature of targeted antibody panels. To address this limitation, we introduce scpFormer, a transformer-based foundation model designed for single-cell proteomics. Pre-trained on over 390 million cells, scpFormer replaces standard index-based tokenization with a continuous, sequence-anchored approach. By combining Evolutionary Scale Modeling (ESM) with value-aware expression embeddings, it dynamically maps variable panels into a shared semantic space without artificial discretization. We demonstrate that scpFormer generates global cell representations that perform competitively in large-scale batch integration and unsupervised clustering. Moreover, its open-vocabulary architecture facilitates in silico panel expansion, assisting in the reconstruction of biological manifolds in sparse clinical datasets. Finally, this learned protein co-expression logic is transferable to bulk-omics tasks, supporting applications like cancer drug response prediction. scpFormer provides a versatile, panel-agnostic framework to facilitate scalable biomarker discovery and precision oncology.