Strategies for Span Labeling with Large Language Models

📅 2026-01-23

📈 Citations: 1

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Large language models lack explicit mechanisms to refer to specific spans in the input text, leading to inconsistent performance with existing span annotation prompting strategies. This work systematically examines three categories of approaches: input tagging, numerical indexing, and content matching, and proposes LogitMatch—a novel constrained decoding method that enforces alignment between model outputs and valid input spans in logit space to address the inconsistency inherent in content matching. Experiments across four diverse tasks demonstrate that LogitMatch significantly outperforms existing content matching methods and, in certain settings, surpasses other strategies, while also confirming that input tagging remains a robust baseline.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly used for text analysis tasks, such as named entity recognition or error detection. Unlike encoder-based models, however, generative architectures lack an explicit mechanism to refer to specific parts of their input. This leads to a variety of ad-hoc prompting strategies for span labeling, often with inconsistent results. In this paper, we categorize these strategies into three families: tagging the input text, indexing numerical positions of spans, and matching span content. To address the limitations of content matching, we introduce LogitMatch, a new constrained decoding method that forces the model's output to align with valid input spans. We evaluate all methods across four diverse tasks. We find that while tagging remains a robust baseline, LogitMatch improves upon competitive matching-based methods by eliminating span matching issues and outperforms other strategies in some setups.

Problem

Research questions and friction points this paper is trying to address.

span labeling

large language models

constrained decoding

named entity recognition

prompting strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

LogitMatch

constrained decoding

span labeling