scpFormer: A Foundation Model for Unified Representation and Integration of the Single-Cell Proteomics

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
Single-cell proteomic data are challenging to integrate due to the fragmented nature of targeted antibody panels. To address this, this work proposes scpFormer, a Transformer-based foundation model that maps variable antibody panels into a unified semantic space through continuous sequence anchoring embeddings, evolutionary-scale modeling, and expression-aware encoding. Leveraging panel-agnostic continuous tokenization and an open-vocabulary architecture, scpFormer enables unsupervised clustering, cross-batch integration, and in silico panel expansion, while effectively reconstructing biological manifolds even from sparse clinical samples. Experimental results demonstrate that scpFormer excels in large-scale integration tasks and successfully facilitates precision oncology applications, including cancer drug response prediction.

Technology Category

Application Category

📝 Abstract
The integration of single-cell proteomic data is often hindered by the fragmented nature of targeted antibody panels. To address this limitation, we introduce scpFormer, a transformer-based foundation model designed for single-cell proteomics. Pre-trained on over 390 million cells, scpFormer replaces standard index-based tokenization with a continuous, sequence-anchored approach. By combining Evolutionary Scale Modeling (ESM) with value-aware expression embeddings, it dynamically maps variable panels into a shared semantic space without artificial discretization. We demonstrate that scpFormer generates global cell representations that perform competitively in large-scale batch integration and unsupervised clustering. Moreover, its open-vocabulary architecture facilitates in silico panel expansion, assisting in the reconstruction of biological manifolds in sparse clinical datasets. Finally, this learned protein co-expression logic is transferable to bulk-omics tasks, supporting applications like cancer drug response prediction. scpFormer provides a versatile, panel-agnostic framework to facilitate scalable biomarker discovery and precision oncology.
Problem

Research questions and friction points this paper is trying to address.

single-cell proteomics
data integration
antibody panels
fragmentation
unified representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model
single-cell proteomics
continuous tokenization
panel-agnostic integration
protein co-expression
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid