Why do you cite? An investigation on citation intents and decision-making classification processes

📅 2024-07-18

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the critical task of automatic academic citation intent classification. We propose a novel two-layer ensemble classification framework integrating SciBERT and XLNet, and—crucially—introduce section headings as structured features, systematically demonstrating their significant performance gains for the first time. Methodologically, we incorporate section-title information and design a synergistic interpretability mechanism by jointly leveraging SHAP and LIME, thereby enhancing both predictive accuracy and decision transparency. Evaluated on the SciCite benchmark, our approach achieves 89.46% Macro-F1, setting a new state-of-the-art at the time. We also release an interactive Flask-based web tool enabling real-time, interpretable citation intent analysis. Our core contributions are: (1) effective exploitation of structured textual features (i.e., section headings); and (2) a high-performance, high-fidelity ensemble-interpretability co-design paradigm that balances accuracy with model trustworthiness.

Technology Category

Application Category

📝 Abstract

Identifying the reason for which an author cites another work is essential to understand the nature of scientific contributions and to assess their impact. Citations are one of the pillars of scholarly communication and most metrics employed to analyze these conceptual links are based on quantitative observations. Behind the act of referencing another scholarly work there is a whole world of meanings that needs to be proficiently and effectively revealed. This study emphasizes the importance of trustfully classifying citation intents to provide more comprehensive and insightful analyses in research assessment. We address this task by presenting a study utilizing advanced Ensemble Strategies for Citation Intent Classification (CIC) incorporating Language Models (LMs) and employing Explainable AI (XAI) techniques to enhance the interpretability and trustworthiness of models' predictions. Our approach involves two ensemble classifiers that utilize fine-tuned SciBERT and XLNet LMs as baselines. We further demonstrate the critical role of section titles as a feature in improving models' performances. The study also introduces a web application developed with Flask and currently available at http://137.204.64.4:81/cic/classifier, aimed at classifying citation intents. One of our models sets as a new state-of-the-art (SOTA) with an 89.46% Macro-F1 score on the SciCite benchmark. The integration of XAI techniques provides insights into the decision-making processes, highlighting the contributions of individual words for level-0 classifications, and of individual models for the metaclassification. The findings suggest that the inclusion of section titles significantly enhances classification performances in the CIC task. Our contributions provide useful insights for developing more robust datasets and methodologies, thus fostering a deeper understanding of scholarly communication.

Problem

Research questions and friction points this paper is trying to address.

Classifies citation intents using ensemble framework.

Improves interpretability with SHAP and structural context.

Achieves state-of-the-art performance on benchmark datasets.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble framework using SciBERT and XLNet models

SHAP analyses for interpretable citation intent classification

Incorporates section titles to enhance classification accuracy

🔎 Similar Papers

Socio-cognitive Networks between Researchers