🤖 AI Summary
This work addresses the challenge that human-authored annotation guidelines are poorly suited for large language model (LLM)-based text annotation due to their informal, ambiguous, and context-dependent nature. We propose a guideline refactoring method oriented toward LLM auditing: automatically transforming natural-language guidelines into structured, semantically precise, instruction-style rules aligned with LLM comprehension preferences. Our approach preserves original semantic intent while systematically enhancing executability and robustness. Evaluated on disease entity recognition using the NCBI Disease Corpus, the refactored guidelines significantly improve LLM annotation accuracy and inter-annotator consistency, enabling automated iterative guideline refinement. Empirical analysis further identifies critical failure modes—including instruction ambiguity and insufficient coverage of edge cases. This study establishes a novel paradigm and reusable methodological framework for building high-quality, LLM-native annotation infrastructure.
📝 Abstract
This study investigates how existing annotation guidelines can be repurposed to instruct large language model (LLM) annotators for text annotation tasks. Traditional guidelines are written for human annotators who internalize training, while LLMs require explicit, structured instructions. We propose a moderation-oriented guideline repurposing method that transforms guidelines into clear directives for LLMs through an LLM moderation process. Using the NCBI Disease Corpus as a case study, our experiments show that repurposed guidelines can effectively guide LLM annotators, while revealing several practical challenges. The results highlight the potential of this workflow to support scalable and cost-effective refinement of annotation guidelines and automated annotation.