🤖 AI Summary
This study addresses argument mining in educational settings—specifically, identifying, classifying, and evaluating arguments in student persuasive essays. We propose a lightweight framework based on small open-source decoder-only large language models (e.g., Phi-3, TinyLlama). Methodologically, it integrates few-shot prompting with supervised fine-tuning, leveraging sequence labeling for argument segmentation and text classification for type identification and quality assessment—enabling localized, low-overhead, privacy-preserving real-time feedback. Our key contribution is the first systematic investigation of multi-task adaptability of compact LLMs for educational argument mining, moving beyond traditional encoder-based architectures while balancing deployability and performance. Experiments show that fine-tuned models significantly outperform baselines on the Feedback Prize dataset (argument segmentation F1 +8.2%, type classification accuracy +6.5%); few-shot prompting achieves baseline-level performance in quality assessment, validating the efficacy of the lightweight paradigm.
📝 Abstract
Argument mining algorithms analyze the argumentative structure of essays, making them a valuable tool for enhancing education by providing targeted feedback on the students' argumentation skills. While current methods often use encoder or encoder-decoder deep learning architectures, decoder-only models remain largely unexplored, offering a promising research direction. This paper proposes leveraging open-source, small Large Language Models (LLMs) for argument mining through few-shot prompting and fine-tuning. These models' small size and open-source nature ensure accessibility, privacy, and computational efficiency, enabling schools and educators to adopt and deploy them locally. Specifically, we perform three tasks: segmentation of student essays into arguments, classification of the arguments by type, and assessment of their quality. We empirically evaluate the models on the Feedback Prize - Predicting Effective Arguments dataset of grade 6-12 students essays and demonstrate how fine-tuned small LLMs outperform baseline methods in segmenting the essays and determining the argument types while few-shot prompting yields comparable performance to that of the baselines in assessing quality. This work highlights the educational potential of small, open-source LLMs to provide real-time, personalized feedback, enhancing independent learning and writing skills while ensuring low computational cost and privacy.