🤖 AI Summary
This study addresses the prediction of English word difficulty to support personalized instruction and assessment design. It proposes two complementary modeling approaches: first, a high-accuracy black-box model based on fine-tuned large language models, enhanced with a soft-target loss function that significantly improves scoring performance; and second, a transparent model that balances interpretability and predictive power, revealing that word difficulty stems not only from production complexity but also from spelling characteristics and item construction. Experimental results demonstrate that the black-box model achieves state-of-the-art performance in the open track (r > 0.91), while the interpretable model substantially outperforms fine-tuned encoder baselines (r > 0.77), offering both practical utility and meaningful linguistic insights.
📝 Abstract
We describe two types of models for vocabulary difficulty prediction: a high-accuracy black-box model, which achieved the top shared task result in the open track, and an explainable model, which outperforms a fine-tuned encoder baseline. As the black-box model, we fine-tuned an LLM using a soft-target loss function for effective application to the rating task, achieving r > 0.91. The explainable model provides insights into what impacts the difficulty of each item while maintaining a strong correlation (r > 0.77). We further analyze the results, demonstrating that the difficulty of items in the British Council's Knowledge-based Vocabulary Lists (KVL) is often affected by spelling difficulty or the construction of the test items, in addition to the genuine production difficulty of the words. We make our code available online at https://github.com/adno/vocabulary-difficulty .