🤖 AI Summary
Spike sorting has long suffered from the lack of ground-truth labels in real neural recordings, hindering rigorous validation of heuristic approaches. To address this, we propose SimSort: a data-driven framework grounded in large-scale biophysical simulation. SimSort generates high-fidelity electrophysiological synthetic datasets, designs an end-to-end deep learning pretraining paradigm, and introduces a novel zero-shot transfer strategy—enabling spike sorting without any real-world annotations. Our method jointly models multi-channel waveforms, incorporates realistic neuronal dynamics, and leverages self-supervised pretraining. Evaluated across multiple real extracellular electrode datasets, SimSort consistently outperforms state-of-the-art methods in accuracy, robustness to noise and drift, and cross-platform generalizability. By circumventing reliance on ground truth, SimSort overcomes the fundamental evaluation bottleneck imposed by label scarcity, establishing a verifiable, scalable paradigm for experimental neuroscience.
📝 Abstract
Spike sorting is an essential process in neural recording, which identifies and separates electrical signals from individual neurons recorded by electrodes in the brain, enabling researchers to study how specific neurons communicate and process information. Although there exist a number of spike sorting methods which have contributed to significant neuroscientific breakthroughs, many are heuristically designed, making it challenging to verify their correctness due to the difficulty of obtaining ground truth labels from real-world neural recordings. In this work, we explore a data-driven, deep learning-based approach. We begin by creating a large-scale dataset through electrophysiology simulations using biologically realistic computational models. We then present extbf{SimSort}, a pretraining framework for spike sorting. Remarkably, when trained on our simulated dataset, SimSort demonstrates strong zero-shot generalization to real-world spike sorting tasks, significantly outperforming existing methods. Our findings underscore the potential of data-driven techniques to enhance the reliability and scalability of spike sorting in experimental neuroscience.