🤖 AI Summary
Automated essay scoring (AES) systems face limited adoption in education due to their opaque “black-box” nature and inability to provide actionable feedback. To address this, we propose an interpretable AES framework that replaces end-to-end score regression with a pedagogically grounded concept-bottleneck model aligned with instructional rubrics. Specifically, it explicitly identifies eight core writing constructs—e.g., thesis clarity and evidence usage—via a multi-head classifier atop a pretrained text encoder, then synthesizes the final score through a lightweight mapping network. The architecture enables real-time teacher intervention on concept-level predictions, with immediate score updates, thereby supporting human-in-the-loop, accountable assessment. An interactive visualization interface facilitates transparent interpretation. Experiments demonstrate competitive scoring accuracy relative to state-of-the-art black-box large language models, while delivering fine-grained, pedagogically meaningful feedback—significantly improving educator trust and student learning outcomes.
📝 Abstract
Understanding how automated grading systems evaluate essays remains a significant challenge for educators and students, especially when large language models function as black boxes. We introduce EssayCBM, a rubric-aligned framework that prioritizes interpretability in essay assessment. Instead of predicting grades directly from text, EssayCBM evaluates eight writing concepts, such as Thesis Clarity and Evidence Use, through dedicated prediction heads on an encoder. These concept scores form a transparent bottleneck, and a lightweight network computes the final grade using only concepts. Instructors can adjust concept predictions and instantly view the updated grade, enabling accountable human-in-the-loop evaluation. EssayCBM matches black-box performance while offering actionable, concept-level feedback through an intuitive web interface.