🤖 AI Summary
This work addresses the challenge of mutual information estimation in low-data regimes, where conventional approaches relying on task-specific discriminators suffer from poor generalization. The authors propose a zero-shot pointwise mutual information (PMI) estimation method that leverages only large language models and carefully designed prompts. By formulating conditional probability estimation as a contrastive prompting task—augmented with an explicit OTHER category—the approach recovers calibrated probability estimates rather than mere rankings, without any model training. This framework achieves the first fully training-free, general-purpose PMI estimator, attaining state-of-the-art zero-shot performance across three benchmark datasets, with a Spearman correlation of up to 0.82 against human-annotated PMI scores. The method is further demonstrated to effectively support automated scoring of student knowledge summaries in computer science education.
📝 Abstract
Estimating mutual information from text usually requires training a task-specific critic, which limits its use in low-data settings. We ask whether large language models can instead estimate pointwise mutual information zero-shot, using only prompts and elicited probabilities. We introduce a benchmark with human-derived ground-truth PMI across three publicly available datasets, and evaluate five information-theoretic prompting-based estimators. Our main method, PromptNCE, frames conditional probability estimation as a contrastive task and augments the candidate set with an explicit OTHER category. We show theoretically that adding OTHER recovers the true conditional P(y | x) rather than just a ranking over listed candidates, turning a contrastive prompt into a general-purpose zero-shot probability estimator. PromptNCE is the best zero-shot method on all three datasets, reaching Spearman correlation up to 0.82 with human-derived PMI. We also present a case study in computer science education showing how these estimators can be used to score student knowledge summaries in a low-data setting.