Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This study addresses the susceptibility of large language models (LLMs) to context-induced behavioral biases in multilingual financial misinformation detection, which leads to inconsistent judgments under complex economic scenarios. To systematically evaluate this issue, the authors introduce the mfmdscen benchmark, integrating a multilingual financial misinformation dataset in English, Chinese, Greek, and Bengali with three expert-designed contextual dimensions: role and personality, geography, and ethnicity and religion. This work proposes a novel evaluation framework that incorporates multidimensional contextual factors—role, geographic, and cultural—to assess 22 prominent LLMs in high-stakes, real-world financial settings. The findings reveal significant context-induced biases across both commercial and open-source models, highlighting their decision-making fragility in nuanced financial contexts.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have been widely applied across various domains of finance. Since their training data are largely derived from human-authored corpora, LLMs may inherit a range of human biases. Behavioral biases can lead to instability and uncertainty in decision-making, particularly when processing financial information. However, existing research on LLM bias has mainly focused on direct questioning or simplified, general-purpose settings, with limited consideration of the complex real-world financial environments and high-risk, context-sensitive, multilingual financial misinformation detection tasks (\mfmd). In this work, we propose \mfmdscen, a comprehensive benchmark for evaluating behavioral biases of LLMs in \mfmd across diverse economic scenarios. In collaboration with financial experts, we construct three types of complex financial scenarios: (i) role- and personality-based, (ii) role- and region-based, and (iii) role-based scenarios incorporating ethnicity and religious beliefs. We further develop a multilingual financial misinformation dataset covering English, Chinese, Greek, and Bengali. By integrating these scenarios with misinformation claims, \mfmdscen enables a systematic evaluation of 22 mainstream LLMs. Our findings reveal that pronounced behavioral biases persist across both commercial and open-source models. This project will be available at https://github.com/lzw108/FMD.

Problem

Research questions and friction points this paper is trying to address.

behavioral bias

financial misinformation detection

multilingual

scenario-induced bias

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

scenario-induced bias

multilingual financial misinformation detection

behavioral bias in LLMs