🤖 AI Summary
This work investigates how large language models (LLMs) leverage “model memory”—i.e., reuse of their own prior labeling outputs—to improve performance in text annotation tasks. We systematically evaluate zero-shot, few-shot, and memory-augmented annotation on political science datasets using GPT-4o and Llama 3.1. Our empirical analysis provides the first evidence that model memory significantly boosts annotation accuracy by 5–25%. Building on this finding, we propose a memory-augmented framework that integrates model memory with reinforcement learning strategies; it yields additional performance gains in three of four benchmark tasks. Methodologically, the study demonstrates that LLMs’ internal states serve as an effective implicit knowledge source for annotation. Substantively, it introduces a scalable, human-label-free paradigm for low-resource, high-consistency annotation—circumventing manual annotation effort while preserving reliability.
📝 Abstract
Generative Large Language Models (LLMs) have shown promising results in text annotation using zero-shot and few-shot learning. Yet these approaches do not allow the model to retain information from previous annotations, making each response independent from the preceding ones. This raises the question of whether model memory -- the LLM having knowledge about its own previous annotations in the same task -- affects performance. In this article, using OpenAI's GPT-4o and Meta's Llama 3.1 on two political science datasets, we demonstrate that allowing the model to retain information about its own previous classifications yields significant performance improvements: between 5 and 25% when compared to zero-shot and few-shot learning. Moreover, memory reinforcement, a novel approach we propose that combines model memory and reinforcement learning, yields additional performance gains in three out of our four tests. These findings have important implications for applied researchers looking to improve performance and efficiency in LLM annotation tasks.