Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations

📅 2025-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Sequence labeling for Chinese low-resource domains—such as named entity recognition (NER)—suffers from weak contextual understanding and poor performance on nested entities. Method: This paper proposes KnowFREE, a span-based, label-explanation-driven framework that requires no external knowledge. It integrates context-sensitive label explanations generated by large language models (LLMs) and introduces a lightweight knowledge fusion mechanism to jointly correct label semantic biases and extract nested entities. We further design an explanation-driven knowledge enhancement workflow and an extended label feature modeling strategy. Contribution/Results: KnowFREE achieves state-of-the-art (SOTA) performance on multiple Chinese low-resource NER benchmarks, with significant F1-score improvements. Notably, it demonstrates strong robustness in handling nested structures and sparsely annotated instances, validating its effectiveness in challenging low-resource scenarios.

Technology Category

Application Category

📝 Abstract
Sequence labeling remains a significant challenge in low-resource, domain-specific scenarios, particularly for character-dense languages like Chinese. Existing methods primarily focus on enhancing model comprehension and improving data diversity to boost performance. However, these approaches still struggle with inadequate model applicability and semantic distribution biases in domain-specific contexts. To overcome these limitations, we propose a novel framework that combines an LLM-based knowledge enhancement workflow with a span-based Knowledge Fusion for Rich and Efficient Extraction (KnowFREE) model. Our workflow employs explanation prompts to generate precise contextual interpretations of target entities, effectively mitigating semantic biases and enriching the model's contextual understanding. The KnowFREE model further integrates extension label features, enabling efficient nested entity extraction without relying on external knowledge during inference. Experiments on multiple Chinese domain-specific sequence labeling datasets demonstrate that our approach achieves state-of-the-art performance, effectively addressing the challenges posed by low-resource settings.
Problem

Research questions and friction points this paper is trying to address.

Limited Resources
Domain-specific Sequence Labeling
Context Understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Model Enhancement
KnowFREE Model Integration
Domain-specific Lexical Understanding
🔎 Similar Papers
No similar papers found.
P
Peichao Lai
School of Computer Science, Peking University
J
Jiaxin Gan
College of Computer and Data Science, Fuzhou University
Feiyang Ye
Feiyang Ye
University of Technology Sydney, Ph.D student
Multi-Task Learning
Yilei Wang
Yilei Wang
Alibaba Cloud
B
Bin Cui
School of Computer Science, Peking University