Automated Analysis of Learning Outcomes and Exam Questions Based on Bloom's Taxonomy

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the need for automated annotation of educational assessments according to Bloom’s Taxonomy (six cognitive levels). It systematically benchmarks traditional machine learning (Naïve Bayes, SVM), deep neural networks (LSTM, BiLSTM, GRU, BiGRU, BERT, RoBERTa), and large language models (GPT-4, Gemini) under few-shot settings. To mitigate data scarcity, we propose a novel data augmentation strategy combining synonym substitution and word-embedding–guided perturbation. Results show that augmented SVM achieves 94% accuracy and F1-score, substantially outperforming zero-shot LLMs—whose best performance is only 73%. The core contribution is the empirical demonstration that lightweight, domain-adapted models, when coupled with judicious data augmentation, can surpass large pre-trained models on constrained educational text classification tasks. This finding provides both methodological guidance and empirical validation for developing cost-effective, high-consistency automated assessment tools in resource-limited educational contexts.

Technology Category

Application Category

📝 Abstract
This paper explores the automatic classification of exam questions and learning outcomes according to Bloom's Taxonomy. A small dataset of 600 sentences labeled with six cognitive categories - Knowledge, Comprehension, Application, Analysis, Synthesis, and Evaluation - was processed using traditional machine learning (ML) models (Naive Bayes, Logistic Regression, Support Vector Machines), recurrent neural network architectures (LSTM, BiLSTM, GRU, BiGRU), transformer-based models (BERT and RoBERTa), and large language models (OpenAI, Gemini, Ollama, Anthropic). Each model was evaluated under different preprocessing and augmentation strategies (for example, synonym replacement, word embeddings, etc.). Among traditional ML approaches, Support Vector Machines (SVM) with data augmentation achieved the best overall performance, reaching 94 percent accuracy, recall, and F1 scores with minimal overfitting. In contrast, the RNN models and BERT suffered from severe overfitting, while RoBERTa initially overcame it but began to show signs as training progressed. Finally, zero-shot evaluations of large language models (LLMs) indicated that OpenAI and Gemini performed best among the tested LLMs, achieving approximately 0.72-0.73 accuracy and comparable F1 scores. These findings highlight the challenges of training complex deep models on limited data and underscore the value of careful data augmentation and simpler algorithms (such as augmented SVM) for Bloom's Taxonomy classification.
Problem

Research questions and friction points this paper is trying to address.

Automatically classifying exam questions using Bloom's Taxonomy categories
Evaluating machine learning models on limited educational data classification
Addressing overfitting challenges in cognitive level classification tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated classification using Bloom's Taxonomy categories
Evaluated traditional ML, neural networks, and large language models
Augmented SVM achieved highest accuracy with minimal overfitting
🔎 Similar Papers
No similar papers found.
R
Ramya Kumar
School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
D
Dhruv Gulwani
School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
Sonit Singh
Sonit Singh
School of Computer Science and Engineering, UNSW, Australia
Artificial IntelligenceComputer VisionNatural Language ProcessingMachine LearningDeep Learning