🤖 AI Summary
This work addresses the challenge of effectively characterizing and comparing the conditional response distributions of language models under varying prompts. To this end, it proposes constructing log-likelihood and pointwise mutual information (PMI) vectors from prompt–response pairs, thereby embedding models into a metric space where distributional differences can be quantified via approximations of the Kullback–Leibler divergence. Leveraging high-dimensional embedding techniques, the approach not only reveals global structural relationships among models and the systematic effects of prompt perturbations but also uncovers that composite prompt effects approximately follow an additive composition rule. Experimental results demonstrate that the resulting “model map” effectively correlates model attributes, task performance, and training data variations, enabling analysis and prediction of model behavior under complex prompt manipulations.
📝 Abstract
We propose a method that represents language models by log-likelihood vectors over prompt-response pairs and constructs model maps for comparing their conditional distributions. In this space, distances between models approximate the KL divergence between the corresponding conditional distributions. Experiments on a large collection of publicly available language models show that the maps capture meaningful global structure, including relationships to model attributes and task performance. The method also captures systematic shifts induced by prompt modifications and their approximate additive compositionality, suggesting a way to analyze and predict the effects of composite prompt operations. We further introduce pointwise mutual information (PMI) vectors to reduce the influence of unconditional distributions; in some cases, PMI-based model maps better reflect training-data-related differences. Overall, the framework supports the analysis of input-dependent model behavior.