🤖 AI Summary
Sequence labeling for Chinese low-resource domains—such as named entity recognition (NER)—suffers from weak contextual understanding and poor performance on nested entities. Method: This paper proposes KnowFREE, a span-based, label-explanation-driven framework that requires no external knowledge. It integrates context-sensitive label explanations generated by large language models (LLMs) and introduces a lightweight knowledge fusion mechanism to jointly correct label semantic biases and extract nested entities. We further design an explanation-driven knowledge enhancement workflow and an extended label feature modeling strategy. Contribution/Results: KnowFREE achieves state-of-the-art (SOTA) performance on multiple Chinese low-resource NER benchmarks, with significant F1-score improvements. Notably, it demonstrates strong robustness in handling nested structures and sparsely annotated instances, validating its effectiveness in challenging low-resource scenarios.
📝 Abstract
Sequence labeling remains a significant challenge in low-resource, domain-specific scenarios, particularly for character-dense languages like Chinese. Existing methods primarily focus on enhancing model comprehension and improving data diversity to boost performance. However, these approaches still struggle with inadequate model applicability and semantic distribution biases in domain-specific contexts. To overcome these limitations, we propose a novel framework that combines an LLM-based knowledge enhancement workflow with a span-based Knowledge Fusion for Rich and Efficient Extraction (KnowFREE) model. Our workflow employs explanation prompts to generate precise contextual interpretations of target entities, effectively mitigating semantic biases and enriching the model's contextual understanding. The KnowFREE model further integrates extension label features, enabling efficient nested entity extraction without relying on external knowledge during inference. Experiments on multiple Chinese domain-specific sequence labeling datasets demonstrate that our approach achieves state-of-the-art performance, effectively addressing the challenges posed by low-resource settings.