🤖 AI Summary
This study investigates whether role-based prompting elicits meaningful and reproducible behavioral differences in multimodal large language models (MLLMs) for urban affective perception tasks. By factorially designing multidimensional personas encompassing gender, socioeconomic status, political orientation, and personality traits, the authors conduct controlled experiments on the PerceptSent dataset—the first systematic evaluation of how role prompting influences MLLM-based urban perception. Results reveal high intra-role behavioral consistency but minimal inter-role divergence: only socioeconomic status and personality yield statistically significant yet practically negligible effects, while gender and political orientation show virtually no impact. Moreover, non-persona-conditioned models often outperform role-informed ones in fine-grained sentiment judgments, and MLLMs consistently exhibit a tendency toward response extremity, performing better on coarse-grained tasks.
📝 Abstract
Large Language Models (LLMs) are increasingly used as proxies for human perception in urban analysis, yet it remains unclear whether persona prompting produces meaningful and reproducible behavioral diversity. We investigate whether distinct personas influence urban sentiment judgments generated by multimodal LLMs. Using a factorial set of personas spanning gender, economic status, political orientation, and personality, we instantiate multiple agents per persona to evaluate urban scene images from the PerceptSent dataset and assess both within-persona consistency and cross-persona variation. Results show strong convergence among agents sharing a persona, indicating stable and reproducible behavior. However, cross-persona differentiation is limited: economic status and personality induce statistically detectable but practically modest variation, while gender shows no measurable effect and political orientation only negligible impact. Agents also exhibit an extremity bias, collapsing intermediate sentiment categories common in human annotations. As a result, performance remains strong on coarse-grained polarity tasks but degrades as sentiment resolution increases, suggesting that simple label-based persona prompting does not capture fine-grained perceptual judgments. To isolate the contribution of persona conditioning, we additionally evaluate the same model without personas. Surprisingly, the no-persona model sometimes matches or exceeds persona-conditioned agreement with human labels across all task variants, suggesting that simple label-based persona prompting may add limited annotation value in this setting.