Spatial Coordinates as a Cell Language: A Multi-Sentence Framework for Imaging Mass Cytometry Analysis

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-cell large language models (LLMs) struggle to effectively integrate spatial coordinates with cell–cell interaction information, resulting in inadequate spatial semantic modeling and insufficient capture of biological relationships. To address this, we propose Spatial2Sentence—a multi-sentence framework tailored for imaging mass cytometry (IMC) data. Our method introduces a novel spatial–expression dual positive/negative sampling paradigm that jointly encodes single-cell expression profiles and spatial proximity into natural language sequences. It is the first to semantically represent spatial coordinates as part of a “cellular language” and explicitly model bidirectional interactions between spatial and functional modalities. The framework integrates multi-task learning, distance-matrix-guided sample construction, and LLM-driven cross-modal textual encoding. Evaluated on a diabetic IMC dataset, Spatial2Sentence achieves absolute improvements of 5.98% in cell-type classification accuracy and 4.18% in clinical state prediction accuracy, while significantly enhancing model interpretability and biological relevance.

Technology Category

Application Category

📝 Abstract
Image mass cytometry (IMC) enables high-dimensional spatial profiling by combining mass cytometry's analytical power with spatial distributions of cell phenotypes. Recent studies leverage large language models (LLMs) to extract cell states by translating gene or protein expression into biological context. However, existing single-cell LLMs face two major challenges: (1) Integration of spatial information: they struggle to generalize spatial coordinates and effectively encode spatial context as text, and (2) Treating each cell independently: they overlook cell-cell interactions, limiting their ability to capture biological relationships. To address these limitations, we propose Spatial2Sentence, a novel framework that integrates single-cell expression and spatial information into natural language using a multi-sentence approach. Spatial2Sentence constructs expression similarity and distance matrices, pairing spatially adjacent and expressionally similar cells as positive pairs while using distant and dissimilar cells as negatives. These multi-sentence representations enable LLMs to learn cellular interactions in both expression and spatial contexts. Equipped with multi-task learning, Spatial2Sentence outperforms existing single-cell LLMs on preprocessed IMC datasets, improving cell-type classification by 5.98% and clinical status prediction by 4.18% on the diabetes dataset while enhancing interpretability. The source code can be found here: https://github.com/UNITES-Lab/Spatial2Sentence.
Problem

Research questions and friction points this paper is trying to address.

Integrating spatial coordinates into cell language models
Capturing cell-cell interactions in expression and spatial contexts
Improving cell-type classification and clinical status prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-sentence framework integrates spatial and expression data
Uses similarity and distance matrices for cell pairs
Multi-task learning enhances classification and prediction accuracy
🔎 Similar Papers
No similar papers found.