Reading Between the Tokens: Improving Preference Predictions through Mechanistic Forecasting

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel “mechanistic prediction” paradigm for forecasting human preferences, moving beyond reliance on surface-level model outputs to leverage structured information embedded within large language models’ internal representations. Using election prediction as a case study, the approach identifies and activates neural components encoding party affiliation, demographic attributes, and ideological signals. Through extensive experiments across seven prominent models, electoral data from six countries, and over 24 million configurations—combined with large-scale representation probing, cross-model and cross-national validation, and multidimensional prompt controls—the study systematically uncovers interpretable and exploitable preference signals within model internals. Results demonstrate that this method significantly outperforms conventional output-based prediction, with pronounced gains in accuracy for specific demographic groups, political parties, and national contexts.

Technology Category

Application Category

📝 Abstract
Large language models are increasingly used to predict human preferences in both scientific and business endeavors, yet current approaches rely exclusively on analyzing model outputs without considering the underlying mechanisms. Using election forecasting as a test case, we introduce mechanistic forecasting, a method that demonstrates that probing internal model representations offers a fundamentally different - and sometimes more effective - approach to preference prediction. Examining over 24 million configurations across 7 models, 6 national elections, multiple persona attributes, and prompt variations, we systematically analyze how demographic and ideological information activates latent party-encoding components within the respective models. We find that leveraging this internal knowledge via mechanistic forecasting (opposed to solely relying on surface-level predictions) can improve prediction accuracy. The effects vary across demographic versus opinion-based attributes, political parties, national contexts, and models. Our findings demonstrate that the latent representational structure of LLMs contains systematic, exploitable information about human preferences, establishing a new path for using language models in social science prediction tasks.
Problem

Research questions and friction points this paper is trying to address.

preference prediction
large language models
mechanistic forecasting
latent representations
human preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

mechanistic forecasting
preference prediction
latent representations
large language models
election forecasting
🔎 Similar Papers
No similar papers found.