When Genes Speak: A Semantic-Guided Framework for Spatially Resolved Transcriptomics Data Clustering

๐Ÿ“… 2025-11-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current spatial transcriptomics analysis models treat genes as isolated numerical features, neglecting the biologically meaningful semantics encoded in gene symbolsโ€”thereby limiting functional interpretability. To address this, we propose SemST, a semantic-guided deep clustering framework that pioneers the integration of large language model (LLM)-derived gene semantic embeddings with graph neural network (GNN)-based modeling of spatial neighborhood structures. Its core innovation is the Fine-grained Semantic Modulation (FSM) module, which performs element-wise, dynamic calibration of spatial features via learnable affine transformations conditioned on gene semantics, offering plug-and-play compatibility. Evaluated on multiple public spatial transcriptomics datasets, SemST consistently outperforms state-of-the-art methods. Ablation studies confirm that FSM robustly enhances baseline performance across diverse settings, empirically validating the efficacy and generalizability of semantic guidance in spatial transcriptomics analysis.

Technology Category

Application Category

๐Ÿ“ Abstract
Spatial transcriptomics enables gene expression profiling with spatial context, offering unprecedented insights into the tissue microenvironment. However, most computational models treat genes as isolated numerical features, ignoring the rich biological semantics encoded in their symbols. This prevents a truly deep understanding of critical biological characteristics. To overcome this limitation, we present SemST, a semantic-guided deep learning framework for spatial transcriptomics data clustering. SemST leverages Large Language Models (LLMs) to enable genes to"speak"through their symbolic meanings, transforming gene sets within each tissue spot into biologically informed embeddings. These embeddings are then fused with the spatial neighborhood relationships captured by Graph Neural Networks (GNNs), achieving a coherent integration of biological function and spatial structure. We further introduce the Fine-grained Semantic Modulation (FSM) module to optimally exploit these biological priors. The FSM module learns spot-specific affine transformations that empower the semantic embeddings to perform an element-wise calibration of the spatial features, thus dynamically injecting high-order biological knowledge into the spatial context. Extensive experiments on public spatial transcriptomics datasets show that SemST achieves state-of-the-art clustering performance. Crucially, the FSM module exhibits plug-and-play versatility, consistently improving the performance when integrated into other baseline methods.
Problem

Research questions and friction points this paper is trying to address.

Integrating gene biological semantics with spatial transcriptomics data clustering
Overcoming limitations of treating genes as isolated numerical features
Developing framework to combine symbolic gene meanings with spatial relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs transform gene symbols into biological embeddings
GNNs capture spatial neighborhood relationships for integration
FSM module calibrates spatial features with semantic embeddings
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jiangkai Long
School of Computer Science, China University of Geosciences, Wuhan 430074, China
Y
Yanran Zhu
School of Computer Science, China University of Geosciences, Wuhan 430074, China
Chang Tang
Chang Tang
Senior Member of IEEE/CCF/CSIG, School of Software Engineering, HUST, Wuhan, China.
Machine LearningPattern Recognition
K
Kun Sun
School of Computer Science, China University of Geosciences, Wuhan 430074, China
Y
Yuanyuan Liu
School of Computer Science, China University of Geosciences, Wuhan 430074, China
Xuesong Yan
Xuesong Yan
School of Computer Science, China University of Geosciences, Wuhan 430074, China