Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

📅 2026-02-20

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study addresses the challenge of automatically selecting high-informativeness contextual examples for native-language vocabulary instruction among high school students. The authors propose a hybrid approach that integrates deep learning with handcrafted features to predict contextual informativeness. Central to their method is a novel metric, the Retention Competency Curve, which is combined with instruction-aware embeddings—derived from MPNet and Qwen3—and manually engineered contextual features, all modeled through a nonlinear regression head. Experimental results demonstrate that the optimal model achieves a dramatic improvement in corpus quality: while discarding only 30% of high-quality contexts, it elevates the ratio of informative to non-informative contexts to 440:1, substantially enhancing the efficacy of instructional materials.

Technology Category

Application Category

📝 Abstract

We describe a modern deep learning system that automatically identifies informative contextual examples (\qu{contexts}) for first language vocabulary instruction for high school student. Our paper compares three modeling approaches: (i) an unsupervised similarity-based strategy using MPNet's uniformly contextualized embeddings, (ii) a supervised framework built on instruction-aware, fine-tuned Qwen3 embeddings with a nonlinear regression head and (iii) model (ii) plus handcrafted context features. We introduce a novel metric called the Retention Competency Curve to visualize trade-offs between the discarded proportion of good contexts and the \qu{good-to-bad} contexts ratio providing a compact, unified lens on model performance. Model (iii) delivers the most dramatic gains with performance of a good-to-bad ratio of 440 all while only throwing out 70\% of the good contexts. In summary, we demonstrate that a modern embedding model on neural network architecture, when guided by human supervision, results in a low-cost large supply of near-perfect contexts for teaching vocabulary for a variety of target words.

Problem

Research questions and friction points this paper is trying to address.

contextual informativeness

vocabulary learning

context selection

educational NLP

deep learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

contextual informativeness

deep learning

embedding fine-tuning