Rethinking Continual Learning for Speech and Audio: A Representation-Centric Taxonomy and Open Problems

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses a critical gap in existing continual learning approaches for speech, which overlook the coupling and geometric sensitivity of acoustic representations—properties that are especially problematic when adapting to the highly entangled continuous representations characteristic of modern foundation models. From a representation-centric perspective, this study proposes the first taxonomy of representation-centered continual learning tailored specifically for speech and audio. The framework is grounded in the joint evolution of linguistic, speaker, and paralinguistic factors within a shared latent space under non-stationary acoustic conditions, leading to a novel classification system based on the geometric evolution of representations. By integrating geometric representation analysis, behavioral modeling of speech foundation models, and continual learning theory, the paper systematically uncovers fundamental incompatibilities between current methods and foundation models, clarifies key challenges, and outlines promising directions for building robust speech systems in non-stationary environments.

📝 Abstract

Speech and audio systems operate in inherently non-stationary environments, yet continual learning (CL) research in this domain, especially in the foundation model era, remains fragmented that fail to account for the coupled, geometry-sensitive nature of acoustic representations. Modern speech foundation models operate over highly entangled, continuous representations that jointly encode linguistic, speaker, and paralinguistic factors within a shared latent space. CL is therefore fundamentally about preserving and evolving shared representation structure rather than retaining isolated task knowledge. In this work, we revisit CL for speech from a representation-centered perspective, and introduce a new taxonomy that organizes CL according to how underlying representation geometry evolves under non-stationary acoustic conditions. We further identify key mismatches between current CL assumptions and speech foundation model behavior, and finally outline a set of open challenges and future research directions.

Problem

Research questions and friction points this paper is trying to address.

continual learning

speech foundation models

acoustic representations

representation geometry

non-stationary environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning

speech foundation models

representation geometry