🤖 AI Summary
This paper addresses the fundamental stability–plasticity dilemma in continual named entity recognition (CNER), where models struggle to retain prior knowledge while adapting to new entities. To resolve this, we propose a dual-perspective co-optimization framework jointly regulating representation and parameter spaces. Methodologically: (1) We design representation-dimension-aggregated knowledge distillation to mitigate excessive stability induced by conventional knowledge distillation; (2) we introduce a weight-guided selective fusion mechanism that dynamically balances parameter inheritance from old models and adaptation to new tasks; (3) we adopt a confidence-driven pseudo-labeling strategy to suppress semantic drift in non-entity classes. Evaluated across three benchmark datasets under ten diverse continual learning settings, our approach consistently outperforms existing state-of-the-art methods. It is the first to achieve controllable trade-offs in both representation and parameter spaces, establishing an interpretable and scalable paradigm for CNER continual learning.
📝 Abstract
Continual Named Entity Recognition (CNER) is an evolving field that focuses on sequentially updating an existing model to incorporate new entity types. Previous CNER methods primarily utilize Knowledge Distillation (KD) to preserve prior knowledge and overcome catastrophic forgetting, strictly ensuring that the representations of old and new models remain consistent. Consequently, they often impart the model with excessive stability (i.e., retention of old knowledge) but limited plasticity (i.e., acquisition of new knowledge). To address this issue, we propose a Stability-Plasticity Trade-off (SPT) method for CNER that balances these aspects from both representation and weight perspectives. From the representation perspective, we introduce a pooling operation into the original KD, permitting a level of plasticity by consolidating representation dimensions. From the weight perspective, we dynamically merge the weights of old and new models, strengthening old knowledge while maintaining new knowledge. During this fusion, we implement a weight-guided selective mechanism to prioritize significant weights. Moreover, we develop a confidence-based pseudo-labeling approach for the current non-entity type, which predicts entity types using the old model to handle the semantic shift of the non-entity type, a challenge specific to CNER that has largely been ignored by previous methods. Extensive experiments across ten CNER settings on three benchmark datasets demonstrate that our SPT method surpasses previous CNER approaches, highlighting its effectiveness in achieving a suitable stability-plasticity trade-off.