Automated Analysis of Learning Outcomes and Exam Questions Based on Bloom's Taxonomy

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study addresses the need for automated annotation of educational assessments according to Bloom’s Taxonomy (six cognitive levels). It systematically benchmarks traditional machine learning (Naïve Bayes, SVM), deep neural networks (LSTM, BiLSTM, GRU, BiGRU, BERT, RoBERTa), and large language models (GPT-4, Gemini) under few-shot settings. To mitigate data scarcity, we propose a novel data augmentation strategy combining synonym substitution and word-embedding–guided perturbation. Results show that augmented SVM achieves 94% accuracy and F1-score, substantially outperforming zero-shot LLMs—whose best performance is only 73%. The core contribution is the empirical demonstration that lightweight, domain-adapted models, when coupled with judicious data augmentation, can surpass large pre-trained models on constrained educational text classification tasks. This finding provides both methodological guidance and empirical validation for developing cost-effective, high-consistency automated assessment tools in resource-limited educational contexts.

Technology Category

Application Category

📝 Abstract

This paper explores the automatic classification of exam questions and learning outcomes according to Bloom's Taxonomy. A small dataset of 600 sentences labeled with six cognitive categories - Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation - was processed using traditional machine learning (ML) models (Naive Bayes, Logistic Regression, Support Vector Machines), recurrent neural network architectures (LSTM, BiLSTM, GRU, BiGRU), transformer-based models (BERT and RoBERTa), and large language models (OpenAI, Gemini, Ollama, Anthropic). Each model was evaluated under different preprocessing and augmentation strategies (for example, synonym replacement, word embeddings, etc.). Among traditional ML approaches, Support Vector Machines (SVM) with data augmentation achieved the best overall performance, reaching 94 percent accuracy, recall, and F1 scores with minimal overfitting. In contrast, the RNN models and BERT suffered from severe overfitting, while RoBERTa initially overcame it but began to show signs as training progressed. Finally, zero-shot evaluations of large language models (LLMs) indicated that OpenAI and Gemini performed best among the tested LLMs, achieving approximately 0.72-0.73 accuracy and comparable F1 scores. These findings highlight the challenges of training complex deep models on limited data and underscore the value of careful data augmentation and simpler algorithms (such as augmented SVM) for Bloom's Taxonomy classification.

Problem

Research questions and friction points this paper is trying to address.

Automatically classifying exam questions using Bloom's Taxonomy categories

Evaluating machine learning models on limited educational data classification

Addressing overfitting challenges in cognitive level classification tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated classification using Bloom's Taxonomy categories

Evaluated traditional ML, neural networks, and large language models

Augmented SVM achieved highest accuracy with minimal overfitting

🔎 Similar Papers

No similar papers found.