Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
📝 Abstract
Programming Knowledge Tracing (PKT) has recently advanced through hybrid approaches that integrate attention-based feature modeling for code representation with RNN-based sequential prediction. While these models report strong empirical performance, their reliability can be sensitive to subtle implementation and experimental design choices. This study revisits representative PKT models and shows that reported gains can be substantially influenced by model configuration and sequence construction practices. We identify issues in attention dimension settings that affect performance estimates, and demonstrate that improper ordering of student attempts, such as ignoring ServerTimestamp, can violate temporal causality and lead to overly optimistic results. To ensure consistent evaluation, hyperparameters are selected via grid search guided by a single designated fold and then fixed uniformly across all folds during cross-validation. We further analyze the role of assignment-wise characteristics and systematically explore the impact of maximum sequence length. Using this protocol, we re-evaluate PKT models on the CodeWorkout dataset. Our results show that, under controlled and consistent settings, the performance gap between attention-enhanced models and standard DKT is significantly reduced, and increased architectural complexity does not consistently translate into superior performance. Beyond individual model comparisons, this work provides practical guidance for reliable and comparable evaluation in programming knowledge tracing.
Problem

Research questions and friction points this paper is trying to address.

Programming Knowledge Tracing
Reliability
Attention Mechanism
Experimental Protocol
Temporal Causality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Programming Knowledge Tracing
Attention Mechanism
Evaluation Protocol
Temporal Causality
Model Reliability
🔎 Similar Papers
2024-02-08International Conference on Machine LearningCitations: 6