🤖 AI Summary
Large language models lack explicit mechanisms to refer to specific spans in the input text, leading to inconsistent performance with existing span annotation prompting strategies. This work systematically examines three categories of approaches: input tagging, numerical indexing, and content matching, and proposes LogitMatch—a novel constrained decoding method that enforces alignment between model outputs and valid input spans in logit space to address the inconsistency inherent in content matching. Experiments across four diverse tasks demonstrate that LogitMatch significantly outperforms existing content matching methods and, in certain settings, surpasses other strategies, while also confirming that input tagging remains a robust baseline.
📝 Abstract
Large language models (LLMs) are increasingly used for text analysis tasks, such as named entity recognition or error detection. Unlike encoder-based models, however, generative architectures lack an explicit mechanism to refer to specific parts of their input. This leads to a variety of ad-hoc prompting strategies for span labeling, often with inconsistent results. In this paper, we categorize these strategies into three families: tagging the input text, indexing numerical positions of spans, and matching span content. To address the limitations of content matching, we introduce LogitMatch, a new constrained decoding method that forces the model's output to align with valid input spans. We evaluate all methods across four diverse tasks. We find that while tagging remains a robust baseline, LogitMatch improves upon competitive matching-based methods by eliminating span matching issues and outperforms other strategies in some setups.