HEART: Emotionally-driven test-time scaling of Language Models

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing test-time scaling methods primarily optimize logical structure while overlooking the guiding potential of affective feedback. This work introduces HEART, the first framework to model Paul Ekman’s six basic emotions as incentive-laden phrases and dynamically inject them as affective prompts during inference to drive iterative self-correction in language models, thereby escaping erroneous reasoning paths. HEART tightly integrates emotion regulation with reasoning, achieving robust reasoning enhancement in both verifier-free and verifier-augmented settings. It substantially outperforms state-of-the-art methods on challenging reasoning benchmarks—including OlympiadBench and Humanity’s Last Exam—demonstrating a significant performance gain attributable to affective feedback in deep reasoning. Notably, it reveals heightened performance volatility in verifier-free scenarios, exposing a critical deployment bottleneck for real-world applications.

Technology Category

Application Category

📝 Abstract
Test-time scaling has shown considerable success in improving the performance of language models on complex reasoning tasks without requiring fine-tuning. However, current strategies such as self-reflection primarily focus on logical or structural refinement. They do not leverage the guiding potential of affective feedback. Inspired by psychological research showing that emotions can modulate cognitive performance, we introduce HEART--a novel framework that uses emotionally-driven prompts for iterative self-correction. HEART provides feedback on a model's incorrect response using a curated set of concise, emotionally charged phrases based on the six universal emotions categorized by Dr. Paul Ekman. By systematically varying the emotional tone of the feedback across iterations, our method guides the model to escape flawed reasoning paths and explore more promising alternatives. We evaluate our framework on challenging reasoning benchmarks including OlympiadBench, Humanity's Last Exam, and SimpleQA. Our results reveal a significant new phenomenon: when guided by an oracle verifier, this affective iteration protocol unlocks significantly deeper reasoning, leading to consistent and substantial increases in accuracy over state-of-the-art baselines with the same verifier. However, we also identify a critical bottleneck for practical deployment. In a verifier-free setting, it struggles to harness these gains consistently, highlighting as a key challenge for future work. Our findings suggest that the next frontier in machine reasoning may lie not just in refining logic, but also in understanding and leveraging the `HEART' of the models.
Problem

Research questions and friction points this paper is trying to address.

Leveraging emotional feedback to improve language model reasoning
Escaping flawed reasoning paths through emotionally-driven prompts
Addressing limitations of logical-only test-time scaling methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses emotionally-driven prompts for iterative self-correction
Varies emotional tone of feedback across reasoning iterations
Leverages six universal emotions to guide model reasoning
🔎 Similar Papers
No similar papers found.