🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit human-like stress responses. Method: Drawing on psychological theory, we construct StressPrompt—a systematically calibrated stress-inducing prompt set—and quantitatively assess how varying stress levels affect instruction following, complex reasoning, and affective understanding across multiple LLMs. We integrate psychometric scale calibration, human evaluation, multi-task benchmarking, and interpretability analyses of logits and attention patterns. Contribution/Results: We provide the first empirical validation of the Yerkes–Dodson law in LLMs: model performance follows a significant inverted-U curve, peaking under moderate stress. Crucially, stress-induced latent representation shifts—analyzed via attention and logit dynamics—exhibit cross-modal similarity to human neurophysiological stress responses observed in fMRI and EEG studies. These findings are robust across diverse state-of-the-art LLMs, establishing a novel paradigm for probing LLM cognition and advancing interpretable, human-aligned AI.
📝 Abstract
Human beings often experience stress, which can significantly influence their performance. This study explores whether Large Language Models (LLMs) exhibit stress responses similar to those of humans and whether their performance fluctuates under different stress-inducing prompts. To investigate this, we developed a novel set of prompts, termed StressPrompt, designed to induce varying levels of stress. These prompts were derived from established psychological frameworks and carefully calibrated based on ratings from human participants. We then applied these prompts to several LLMs to assess their responses across a range of tasks, including instruction-following, complex reasoning, and emotional intelligence. The findings suggest that LLMs, like humans, perform optimally under moderate stress, consistent with the Yerkes-Dodson law. Notably, their performance declines under both low and high-stress conditions. Our analysis further revealed that these StressPrompts significantly alter the internal states of LLMs, leading to changes in their neural representations that mirror human responses to stress. This research provides critical insights into the operational robustness and flexibility of LLMs, demonstrating the importance of designing AI systems capable of maintaining high performance in real-world scenarios where stress is prevalent, such as in customer service, healthcare, and emergency response contexts. Moreover, this study contributes to the broader AI research community by offering a new perspective on how LLMs handle different scenarios and their similarities to human cognition.