🤖 AI Summary
Neurocomputational modeling and artificial implementation of tactile perception lag significantly behind vision and language domains. This paper proposes an Encoder-Attender-Decoder framework employing task-optimized convolutional recurrent neural networks (ConvRNNs) to process biomimetic whisker-array time-series tactile signals, achieving— for the first time—quantitative alignment between learned neural representations and in vivo mouse somatosensory cortical activity. Key contributions include: (1) the first quantitative characterization of inductive biases in the somatosensory cortex; (2) empirical validation that nonlinear recurrent dynamics are essential for generalizable tactile representation; and (3) a tactile-specific contrastive self-supervised paradigm enabling label-free neural fitting. Experiments demonstrate that ConvRNN encoders substantially outperform feedforward and state-space models; neural interpretability of representational variability saturates; and both supervised and self-supervised variants exhibit a consistent linear relationship between behavioral performance and neural alignment—mirroring biological somatosensory processing.
📝 Abstract
Tactile sensing remains far less understood in neuroscience and less effective in artificial systems compared to more mature modalities such as vision and language. We bridge these gaps by introducing a novel Encoder-Attender-Decoder (EAD) framework to systematically explore the space of task-optimized temporal neural networks trained on realistic tactile input sequences from a customized rodent whisker-array simulator. We identify convolutional recurrent neural networks (ConvRNNs) as superior encoders to purely feedforward and state-space architectures for tactile categorization. Crucially, these ConvRNN-encoder-based EAD models achieve neural representations closely matching rodent somatosensory cortex, saturating the explainable neural variability and revealing a clear linear relationship between supervised categorization performance and neural alignment. Furthermore, contrastive self-supervised ConvRNN-encoder-based EADs, trained with tactile-specific augmentations, match supervised neural fits, serving as an ethologically-relevant, label-free proxy. For neuroscience, our findings highlight nonlinear recurrent processing as important for general-purpose tactile representations in somatosensory cortex, providing the first quantitative characterization of the underlying inductive biases in this system. For embodied AI, our results emphasize the importance of recurrent EAD architectures to handle realistic tactile inputs, along with tailored self-supervised learning methods for achieving robust tactile perception with the same type of sensors animals use to sense in unstructured environments.