Generative Data Imputation for Sparse Learner Performance Data Using Generative Adversarial Imputation Networks

📅 2025-03-23

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

To address the three-dimensional (learner-item-attempt) sparsity in learner response data from intelligent tutoring systems—caused by skipping or incomplete attempts—this paper proposes GAIN-3D, a generative adversarial imputation network tailored for educational sequential interaction data. It pioneers the extension of the Generative Adversarial Imputation Network (GAIN) to 3D tensor structures, integrates CNN modules to enhance local pattern modeling, and adopts least-squares loss to improve imputation consistency and stability. Experiments on ARC, ASSISTments, and MATHia datasets demonstrate that GAIN-3D significantly outperforms conventional tensor decomposition methods and existing GAN-based imputation approaches. Validation via Bayesian Knowledge Tracing (BKT) parameter estimation and KL divergence analysis confirms that imputed data faithfully preserve the original distribution. Consequently, GAIN-3D substantially improves knowledge tracing model fit and enhances the reliability of learning behavior representation.

Technology Category

Application Category

📝 Abstract

Learner performance data collected by Intelligent Tutoring Systems (ITSs), such as responses to questions, is essential for modeling and predicting learners' knowledge states. However, missing responses due to skips or incomplete attempts create data sparsity, challenging accurate assessment and personalized instruction. To address this, we propose a generative imputation approach using Generative Adversarial Imputation Networks (GAIN). Our method features a three-dimensional (3D) framework (learners, questions, and attempts), flexibly accommodating various sparsity levels. Enhanced by convolutional neural networks and optimized with a least squares loss function, the GAIN-based method aligns input and output dimensions to question-attempt matrices along the learners' dimension. Extensive experiments using datasets from AutoTutor Adult Reading Comprehension (ARC), ASSISTments, and MATHia demonstrate that our approach significantly outperforms tensor factorization and alternative GAN methods in imputation accuracy across different attempt scenarios. Bayesian Knowledge Tracing (BKT) further validates the effectiveness of the imputed data by estimating learning parameters: initial knowledge (P(L0)), learning rate (P(T)), guess rate (P(G)), and slip rate (P(S)). Results indicate the imputed data enhances model fit and closely mirrors original distributions, capturing underlying learning behaviors reliably. Kullback-Leibler (KL) divergence assessments confirm minimal divergence, showing the imputed data preserves essential learning characteristics effectively. These findings underscore GAIN's capability as a robust imputation tool in ITSs, alleviating data sparsity and supporting adaptive, individualized instruction, ultimately leading to more precise and responsive learner assessments and improved educational outcomes.

Problem

Research questions and friction points this paper is trying to address.

Imputing missing learner performance data in ITSs

Addressing data sparsity for accurate knowledge assessment

Enhancing personalized instruction with generative imputation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Adversarial Imputation Networks for data

3D framework for learners, questions, attempts

Enhanced by convolutional neural networks optimization

🔎 Similar Papers

Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI