🤖 AI Summary
Large language models (LLMs) exhibit suboptimal accuracy in social science text annotation due to the semantic complexity of such texts and the ambiguity of annotation guidelines.
Method: We propose PromptUltra, an automated prompt optimization framework tailored for annotation tasks—integrating zero-shot and few-shot prompting, LLM-based reasoning, and iterative search guided by evaluation feedback to systematically generate high-quality annotation prompts.
Contribution/Results: Experiments demonstrate that our approach significantly improves LLM annotation accuracy, achieving parity with or surpassing human annotator performance. We open-source PromptUltra, a lightweight browser-based tool enabling non-technical users to perform AI-assisted annotation with low entry barriers and high reliability. This work establishes the first reproducible and scalable prompt optimization paradigm specifically designed for social science text annotation.
📝 Abstract
Large Language Models have recently been applied to text annotation tasks from social sciences, equalling or surpassing the performance of human workers at a fraction of the cost. However, no inquiry has yet been made on the impact of prompt selection on labelling accuracy. In this study, we show that performance greatly varies between prompts, and we apply the method of automatic prompt optimization to systematically craft high quality prompts. We also provide the community with a simple, browser-based implementation of the method at https://prompt-ultra.github.io/ .