Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the limitations of conventional deep learning in speech processing—namely its reliance on large-scale data, global backpropagation, and entangled representations, which lack biological plausibility. For the first time, the Assembly Calculus is applied to continuous speech signals through a novel neural encoding scheme combining probabilistic Mel binarization with population-coded MFCCs. A multi-regional, hierarchical spiking neural architecture is constructed based on Hebbian plasticity and winner-take-all dynamics, enabling phoneme boundary detection (F1=0.69), word boundary detection (F1=0.61), phoneme recognition (47.5% accuracy), and command recognition (45.1% accuracy) without any weight training. This approach establishes a new, biologically interpretable paradigm for speech processing.

Technology Category

Application Category

📝 Abstract

Deep learning dominates speech processing but relies on massive datasets, global backpropagation-guided weight updates, and produces entangled representations. Assembly Calculus (AC), which models sparse neuronal assemblies via Hebbian plasticity and winner-take-all competition, offers a biologically grounded alternative, yet prior work focused on discrete symbolic inputs. We introduce an AC-based speech processing framework that operates directly on continuous speech by combining three key contributions:(i) neural encoding that converts speech into assembly-compatible spike patterns using probabilistic mel binarisation and population-coded MFCCs; (ii) a multi-area architecture organising assemblies across hierarchical timescales and classes; and (iii) cross-area update schemes for downstream tasks. Applied to two core tasks of boundary detection and segment classification, our framework detects phone (F1=0.69) and word (F1=0.61) boundaries without any weight training, and achieves 47.5% and 45.1% accuracy on phone and command recognition. These results show that AC-based dynamical systems are a viable alternative to deep learning for speech processing.

Problem

Research questions and friction points this paper is trying to address.

speech processing

deep learning limitations

neural assemblies

continuous speech

biologically plausible models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Assembly Calculus

speech segmentation

neural assemblies