Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs

๐Ÿ“… 2025-02-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work investigates the capability of open-source large language models (LLMs) to predict citation intentโ€”without domain-specific pretraining. We systematically evaluate adaptation paradigms including zero-shot inference, in-context learning (with 1, few, or multiple examples), and full supervised fine-tuning across 12 open-source LLM variants spanning five major families. Our empirical study is the first to demonstrate that general-purpose open-source LLMs, with only lightweight fine-tuning, can match or surpass domain-specialized models such as SciBERT. We propose and publicly release an end-to-end, reproducible evaluation framework for citation intent classification. Results delineate the performance boundaries of LLMs of varying scales and architectures on fine-grained academic intent recognition, identify the optimal base model, and achieve a significant improvement of +8.2% F1 score. This establishes a low-cost, highly adaptable paradigm for academic NLP modeling.

Technology Category

Application Category

๐Ÿ“ Abstract
This work investigates the ability of open Large Language Models (LLMs) to predict citation intent through in-context learning and fine-tuning. Unlike traditional approaches that rely on pre-trained models like SciBERT, which require extensive domain-specific pretraining and specialized architectures, we demonstrate that general-purpose LLMs can be adapted to this task with minimal task-specific data. We evaluate twelve model variations across five prominent open LLM families using zero, one, few, and many-shot prompting to assess performance across scenarios. Our experimental study identifies the top-performing model through extensive experimentation of in-context learning-related parameters, which we fine-tune to further enhance task performance. The results highlight the strengths and limitations of LLMs in recognizing citation intents, providing valuable insights for model selection and prompt engineering. Additionally, we make our end-to-end evaluation framework and models openly available for future use.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' citation intent prediction
Comparing in-context learning vs fine-tuning
Evaluating open LLMs' adaptability with minimal data
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs predict citation intent
In-context learning and fine-tuning
General-purpose LLMs adaptation