Do LLMs Experience an Internal Polylogue? Investigating Reasoning through the Lens of Personas

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work proposes Polylogue, a framework designed to enable interpretable dynamic monitoring and stage-adaptive intervention during large language model inference. The core idea involves transforming static “personality vectors” into temporal signals, tracking the dynamic alignment between hidden activations and personality vectors throughout the generation process, and leveraging the resulting time-series features to perform paragraph-conditioned latent-space interventions. Experimental results demonstrate that Polylogue matches the performance of low-dimensional activation baselines in predicting answer correctness on MMLU-Pro and significantly improves reasoning accuracy across three mainstream open-source models, thereby validating its effectiveness and generalizability as a dynamic control mechanism.

📝 Abstract

Recent work shows that large language models (LLMs) encode behavioural traits ("personas") as linear directions in activation space, often called "persona vectors". Prior work has used such directions as static handles for behavioural steering. Building on this, we treat them as dynamic signals instead: probes we can monitor and intervene on as reasoning unfolds. We use the term polylogue to denote the time series of alignments between persona vectors and hidden activations over the course of generation. Experiments across four open-weight models show that polylogue features predict correctness on MMLU-Pro competitively with low-dimensional activation baselines, while remaining interpretable through their associated persona directions. They also suggest concrete steering targets, namely which latent directions to modulate at different stages of a response. We instantiate this as a simple paragraph-conditioned intervention that improves accuracy on three of four models, pointing to stage-aware latent steering as a promising direction for reasoning-time control. Together, this positions the polylogue as an interpretable tool for reasoning-time monitoring and intervention.

Problem

Research questions and friction points this paper is trying to address.

polylogue

personas

large language models

reasoning

activation space

Innovation

Methods, ideas, or system contributions that make the work stand out.

polylogue

persona vectors

reasoning-time intervention