🤖 AI Summary
This work addresses a critical challenge in single-cell RNA sequencing data integration: the tendency of conventional whole-transcriptome harmonization approaches to over-correct, thereby compromising the preservation of genuine biological signals while removing batch effects. To overcome this limitation, the authors propose an adaptive integration framework that explicitly decouples genes at the input layer into domain-invariant Anchors and domain-sensitive Variants. The method employs an asymmetric dual-stream sparse diffusion encoder, enhanced with a stop-gradient graph cache, multi-scale structural representation learning, and a bounded residual gating mechanism to effectively prevent shortcut learning. Through an asymmetric Align-Refine-Fuse protocol, the approach consistently outperforms state-of-the-art methods across multiple benchmarks, achieving robust batch correction while faithfully retaining subtle biological clustering structures.
📝 Abstract
A critical challenge in single-cell RNA sequencing (scRNA-seq) integration is resolving the tension between eliminating batch effects and maintaining biological fidelity. While recent evidence indicates that batch effects manifest heterogeneously across genes, most existing methods process the transcriptome uniformly, frequently resulting in over-correction and loss of subtle biological signals. To address this, we present scHelix, a dataset-adaptive framework that fundamentally changes how features are processed by explicitly partitioning genes into domain-invariant Anchors and domain-sensitive Variants at the input level. scHelix utilizes a dual-stream sparse diffusion encoder equipped with stop-gradient graph caching to efficiently learn multi-scale structural representations. The core of our approach is a novel asymmetric Align-Refine-Fuse protocol: the unstable Variant stream is first aligned to the robust topology of the Anchor stream, followed by a conservative refinement phase where the Anchor stream absorbs denoised details via bounded residual gating. This divide-and-conquer architecture prevents shortcut learning and ensures robust batch removal without compromising the integrity of biological clusters. Extensive benchmarking demonstrates that scHelix outperforms state-of-the-art methods.