🤖 AI Summary
This work investigates how simulated Big Five personality traits in large language models (LLMs) modulate priming effects in relevance judgment tasks. Addressing the susceptibility of LLMs to carryover bias from prior judgments, we introduce personality prompting—systematically injecting the five dimensions (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism)—into relevance assessment. We conduct empirical evaluation across multiple models (Llama-3, Qwen, GPT-4) on the TREC 2021/2022 Deep Learning tracks. Results show that high Openness and low Neuroticism significantly attenuate priming effects; optimal personality configurations are model- and task-dependent. This study pioneers the integration of personality psychology principles into prompt engineering, establishing a novel, interpretable, and controllable paradigm for mitigating discriminative biases in LLMs.
📝 Abstract
Recent research has explored LLMs as scalable tools for relevance labeling, but studies indicate they are susceptible to priming effects, where prior relevance judgments influence later ones. Although psychological theories link personality traits to such biases, it is unclear whether simulated personalities in LLMs exhibit similar effects. We investigate how Big Five personality profiles in LLMs influence priming in relevance labeling, using multiple LLMs on TREC 2021 and 2022 Deep Learning Track datasets. Our results show that certain profiles, such as High Openness and Low Neuroticism, consistently reduce priming susceptibility. Additionally, the most effective personality in mitigating priming may vary across models and task types. Based on these findings, we propose personality prompting as a method to mitigate threshold priming, connecting psychological evidence with LLM-based evaluation practices.