Bayesian Active Learning for Multi-Criteria Comparative Judgement in Educational Assessment

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In educational assessment, comparative judgment (CJ) excels at holistic ranking but struggles to support criterion-based, multidimensional competency decomposition and fine-grained feedback—revealing a methodological gap between holistic and standards-aligned evaluation. To bridge this, we propose the Multi-Criteria Bayesian Comparative Judgment (MC-BCJ) framework: the first to extend Bayesian preference modeling to jointly infer multiple independent learning outcomes, enabling simultaneous generation of holistic rankings and criterion-specific ordinal rankings while quantifying predictive uncertainty. MC-BCJ innovatively integrates information-entropy-driven active learning, multi-output ordinal regression, and interpretable rater-consistency inference. Evaluated on synthetic and real-world educational datasets, MC-BCJ achieves significantly higher annotation efficiency and superior per-criterion prediction accuracy. It delivers holistic and criterion-level rankings with calibrated confidence intervals, alongside explicit consistency metrics—effectively closing the critical gap between holistic judgment and standards-oriented assessment.

Technology Category

Application Category

📝 Abstract
Comparative Judgement (CJ) provides an alternative assessment approach by evaluating work holistically rather than breaking it into discrete criteria. This method leverages human ability to make nuanced comparisons, yielding more reliable and valid assessments. CJ aligns with real-world evaluations, where overall quality emerges from the interplay of various elements. However, rubrics remain widely used in education, offering structured criteria for grading and detailed feedback. This creates a gap between CJ's holistic ranking and the need for criterion-based performance breakdowns. This paper addresses this gap using a Bayesian approach. We build on Bayesian CJ (BCJ) by Gray et al., which directly models preferences instead of using likelihoods over total scores, allowing for expected ranks with uncertainty estimation. Their entropy-based active learning method selects the most informative pairwise comparisons for assessors. We extend BCJ to handle multiple independent learning outcome (LO) components, defined by a rubric, enabling both holistic and component-wise predictive rankings with uncertainty estimates. Additionally, we propose a method to aggregate entropies and identify the most informative comparison for assessors. Experiments on synthetic and real data demonstrate our method's effectiveness. Finally, we address a key limitation of BCJ, which is the inability to quantify assessor agreement. We show how to derive agreement levels, enhancing transparency in assessment.
Problem

Research questions and friction points this paper is trying to address.

Bridges gap between holistic and criterion-based educational assessments.
Extends Bayesian CJ to handle multiple learning outcome components.
Quantifies assessor agreement to enhance assessment transparency.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian approach for multi-criteria comparative judgement
Entropy-based active learning for informative comparisons
Quantification of assessor agreement for transparency
🔎 Similar Papers
No similar papers found.