[b]=[d]-[t]+[p]: Self-supervised Speech Models Discover Phonological Vector Arithmetic

📅 2026-02-21

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This study investigates whether the internal representations of self-supervised speech models are organized in terms of interpretable phonological feature vectors. Through cross-linguistic analysis of model representations across 96 languages—integrating phonological probing, vector arithmetic operations, and acoustic-phonological correspondence modeling—the work demonstrates for the first time that these models exhibit phonological vector arithmetic capabilities (e.g., [d] − [t] + [p] ≈ [b]) and can generate continuous voicing contrasts via vector scaling. The findings reveal that the learned representations are systematic, structured, and compositional, with the vector space continuously reflecting the degree of acoustic realization of phonological features. This provides strong evidence for the phonological interpretability of self-supervised speech representations.

Technology Category

Application Category

📝 Abstract

Self-supervised speech models (S3Ms) are known to encode rich phonetic information, yet how this information is structured remains underexplored. We conduct a comprehensive study across 96 languages to analyze the underlying structure of S3M representations, with particular attention to phonological vectors. We first show that there exist linear directions within the model's representation space that correspond to phonological features. We further demonstrate that the scale of these phonological vectors correlate to the degree of acoustic realization of their corresponding phonological features in a continuous manner. For example, the difference between [d] and [t] yields a voicing vector: adding this vector to [p] produces [b], while scaling it results in a continuum of voicing. Together, these findings indicate that S3Ms encode speech using phonologically interpretable and compositional vectors, demonstrating phonological vector arithmetic. All code and interactive demos are available at https://github.com/juice500ml/phonetic-arithmetic .

Problem

Research questions and friction points this paper is trying to address.

self-supervised speech models

phonological features

vector arithmetic

speech representation

phonetic structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised speech models

phonological vector arithmetic

compositional representations