Using machine learning to measure evidence of students' sensemaking in physics courses

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional educational assessment overemphasizes solution correctness while neglecting students’ sensemaking—i.e., their conceptual construction and explanatory reasoning about physical phenomena. Method: We propose the first computationally operationalizable framework for sensemaking, grounded in Physics Education Research (PER) theory. Our approach integrates BERT, RoBERTa, and Sentence-BERT within a multi-encoder machine learning architecture, coupled with a human-in-the-loop annotation and model distillation pipeline, validated end-to-end on 385 authentic student-generated explanations. Contribution/Results: We find no significant linear correlation between sensemaking proficiency and problem-solving correctness; automated sensemaking scoring achieves high inter-rater agreement (Cohen’s κ > 0.85). This work moves beyond the correctness-only paradigm, introducing a dual-dimensional diagnostic tool—“correctness + understanding”—that enables fine-grained formative assessment and actionable pedagogical feedback.

Technology Category

Application Category

📝 Abstract
In the education system, problem-solving correctness is often inappropriately conflated with student learning. Advances in both Physics Education Research (PER) and Machine Learning (ML) provide the initial tools to develop a more meaningful and efficient measurement scheme for whether physics students are engaging in sensemaking: a learning process of figuring out the how and why for a particular phenomena. In this work, we contribute such a measurement scheme, which quantifies the evidence of students' physical sensemaking given their written explanations for their solutions to physics problems. We outline how the proposed human annotation scheme can be automated into a deployable ML model using language encoders and shared probabilistic classifiers. The procedure is scalable for a large number of problems and students. We implement three unique language encoders with logistic regression, and provide a deployability analysis on 385 real student explanations from the 2023 Introduction to Physics course at Tufts University. Furthermore, we compute sensemaking scores for all students, and analyze these measurements alongside their corresponding problem-solving accuracies. We find no linear relationship between these two variables, supporting the hypothesis that one is not a reliable proxy for the other. We discuss how sensemaking scores can be used alongside problem-solving accuracies to provide a more nuanced snapshot of student performance in physics class.
Problem

Research questions and friction points this paper is trying to address.

Develops a machine learning model to measure students' sensemaking in physics.
Automates human annotation of student explanations using language encoders.
Analyzes sensemaking scores and problem-solving accuracies for nuanced performance insights.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automates sensemaking measurement via machine learning
Uses language encoders with logistic regression
Analyzes sensemaking scores and problem-solving accuracies
🔎 Similar Papers
No similar papers found.
K
Kaitlin Gili
Department of Computer Science, Tufts University, Medford, MA, U.S.A.
K
Kyle Heuton
Department of Computer Science, Tufts University, Medford, MA, U.S.A.
A
Astha Shah
Department of Computer Science, Tufts University, Medford, MA, U.S.A.
Michael C. Hughes
Michael C. Hughes
Assistant Professor of Computer Science, Tufts University
Machine LearningClinical Informatics