🤖 AI Summary
Legal information extraction faces significant challenges due to the high cost of manual annotation and data scarcity. To address these issues, this work proposes a concise and general-purpose data augmentation method based on large language models (LLMs), which constructs an end-to-end augmentation pipeline to automatically generate high-quality training samples, thereby reducing reliance on human-annotated data. The proposed approach not only substantially improves the performance, robustness, and generalization capability of legal information extraction systems but also demonstrates strong transferability to other natural language processing tasks, highlighting its versatility and practical utility.
📝 Abstract
In this paper, we propose a pipeline leveraging Large Language Models (LLMs) for data augmentation in Information Extraction tasks within the legal domain. The proposed method is both simple and effective, significantly reducing the manual effort required for data annotation while enhancing the robustness of Information Extraction systems. Furthermore, the method is generalizable, making it applicable to various Natural Language Processing (NLP) tasks beyond the legal domain.