Supervised In-Context Fine-Tuning for Generative Sequence Labeling

📅 2025-08-31

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Causal language models (LLMs) exhibit limited performance on generative sequence labeling (SL) tasks due to suboptimal prompting and contextual interference. Method: We propose SIFT, a supervised in-context fine-tuning framework that reformulates SL as a constrained text generation task, integrating in-context learning, supervised fine-tuning, and constrained decoding. Crucially, we identify and eliminate redundant instruction templates—previously assumed essential—which mitigates long-context interference and improves label consistency. Contribution/Results: SIFT establishes the first supervised in-context fine-tuning paradigm tailored for SL. We empirically demonstrate that prompt minimization significantly enhances LLM accuracy and robustness on benchmark tasks such as named entity recognition. Experiments across multiple standard SL benchmarks show that SIFT consistently outperforms both vanilla in-context learning and decoder-as-encoder fine-tuning approaches, validating the efficacy and promise of generative paradigms for structured prediction.

Technology Category

Application Category

📝 Abstract

Sequence labeling (SL) tasks, where labels are assigned to tokens, are abundant in NLP (e.g., named entity recognition and aspect-based sentiment analysis). Owing to the intuition that they require bidirectional context, SL tasks are commonly tackled with encoder-only models. Recent work also shows that removing the causal mask in fine-tuning enables decoder-based LLMs to become effective token classifiers. Less work, however, focused on (supervised) generative SL, a more natural setting for causal LLMs. Due to their rapid scaling, causal LLMs applied to SL are expected to outperform encoders, whose own development has stagnated. In this work, we propose supervised in-context fine-tuning (SIFT) for generative SL. SIFT casts SL tasks as constrained response generation, natural to LLMs, combining (1) in-context learning (ICL) from demonstrations with (2) supervised fine-tuning. SIFT considerably outperforms both ICL and decoder-as-encoder fine-tuning baselines on a range of standard SL tasks. We further find that although long context hinders the performance of generative SL in both ICL and SIFT, this deficiency can be mitigated by removing the instruction, as instructions are shown to be largely unnecessary for achieving strong SL performance with SIFT. Our findings highlight strengths and limitations of SL with LLMs, underscoring the importance of a response-based generative task formulation for effective SL performance.

Problem

Research questions and friction points this paper is trying to address.

Improving generative sequence labeling with supervised in-context fine-tuning

Addressing limitations of causal LLMs in token classification tasks

Enhancing performance on standard sequence labeling benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised in-context fine-tuning for generative sequence labeling

Combines in-context learning with supervised fine-tuning

Casts sequence labeling as constrained response generation

🔎 Similar Papers

No similar papers found.