🤖 AI Summary
Traditional educational assessment overemphasizes solution correctness while neglecting students’ sensemaking—i.e., their conceptual construction and explanatory reasoning about physical phenomena.
Method: We propose the first computationally operationalizable framework for sensemaking, grounded in Physics Education Research (PER) theory. Our approach integrates BERT, RoBERTa, and Sentence-BERT within a multi-encoder machine learning architecture, coupled with a human-in-the-loop annotation and model distillation pipeline, validated end-to-end on 385 authentic student-generated explanations.
Contribution/Results: We find no significant linear correlation between sensemaking proficiency and problem-solving correctness; automated sensemaking scoring achieves high inter-rater agreement (Cohen’s κ > 0.85). This work moves beyond the correctness-only paradigm, introducing a dual-dimensional diagnostic tool—“correctness + understanding”—that enables fine-grained formative assessment and actionable pedagogical feedback.
📝 Abstract
In the education system, problem-solving correctness is often inappropriately conflated with student learning. Advances in both Physics Education Research (PER) and Machine Learning (ML) provide the initial tools to develop a more meaningful and efficient measurement scheme for whether physics students are engaging in sensemaking: a learning process of figuring out the how and why for a particular phenomena. In this work, we contribute such a measurement scheme, which quantifies the evidence of students' physical sensemaking given their written explanations for their solutions to physics problems. We outline how the proposed human annotation scheme can be automated into a deployable ML model using language encoders and shared probabilistic classifiers. The procedure is scalable for a large number of problems and students. We implement three unique language encoders with logistic regression, and provide a deployability analysis on 385 real student explanations from the 2023 Introduction to Physics course at Tufts University. Furthermore, we compute sensemaking scores for all students, and analyze these measurements alongside their corresponding problem-solving accuracies. We find no linear relationship between these two variables, supporting the hypothesis that one is not a reliable proxy for the other. We discuss how sensemaking scores can be used alongside problem-solving accuracies to provide a more nuanced snapshot of student performance in physics class.