🤖 AI Summary
This work addresses the limitations of conventional deep learning in speech processing—namely its reliance on large-scale data, global backpropagation, and entangled representations, which lack biological plausibility. For the first time, the Assembly Calculus is applied to continuous speech signals through a novel neural encoding scheme combining probabilistic Mel binarization with population-coded MFCCs. A multi-regional, hierarchical spiking neural architecture is constructed based on Hebbian plasticity and winner-take-all dynamics, enabling phoneme boundary detection (F1=0.69), word boundary detection (F1=0.61), phoneme recognition (47.5% accuracy), and command recognition (45.1% accuracy) without any weight training. This approach establishes a new, biologically interpretable paradigm for speech processing.
📝 Abstract
Deep learning dominates speech processing but relies on massive datasets, global backpropagation-guided weight updates, and produces entangled representations. Assembly Calculus (AC), which models sparse neuronal assemblies via Hebbian plasticity and winner-take-all competition, offers a biologically grounded alternative, yet prior work focused on discrete symbolic inputs. We introduce an AC-based speech processing framework that operates directly on continuous speech by combining three key contributions:(i) neural encoding that converts speech into assembly-compatible spike patterns using probabilistic mel binarisation and population-coded MFCCs; (ii) a multi-area architecture organising assemblies across hierarchical timescales and classes; and (iii) cross-area update schemes for downstream tasks. Applied to two core tasks of boundary detection and segment classification, our framework detects phone (F1=0.69) and word (F1=0.61) boundaries without any weight training, and achieves 47.5% and 45.1% accuracy on phone and command recognition. These results show that AC-based dynamical systems are a viable alternative to deep learning for speech processing.