Uncovering Representation Bias for Investment Decisions in Open-Source Large Language Models

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study identifies systematic representational biases in the open-source large language model Qwen—specifically concerning firm size, industry classification, and financial characteristics—and examines their implications for credibility assessment in investment decision-making. We propose Balanced Polling Prompting, a novel method integrating constrained decoding with token-level logit aggregation, enabling the first quantitative measurement of confidence bias across multiple financial contexts. Using statistical hypothesis testing and ANOVA, we construct a firm-level confidence scoring framework. Results show that firm size and valuation positively correlate with model confidence, whereas risk factors significantly suppress it; technology-sector predictions exhibit the highest bias volatility; fundamental metrics (e.g., P/E, ROE) elicit the most robust ranking behavior, while growth indicators (e.g., revenue CAGR, EPS growth) yield the weakest discriminative performance. This work establishes a new methodology and empirical benchmark for trustworthy evaluation and bias mitigation in financial foundation models.

Technology Category

Application Category

📝 Abstract

Large Language Models are increasingly adopted in financial applications to support investment workflows. However, prior studies have seldom examined how these models reflect biases related to firm size, sector, or financial characteristics, which can significantly impact decision-making. This paper addresses this gap by focusing on representation bias in open-source Qwen models. We propose a balanced round-robin prompting method over approximately 150 U.S. equities, applying constrained decoding and token-logit aggregation to derive firm-level confidence scores across financial contexts. Using statistical tests and variance analysis, we find that firm size and valuation consistently increase model confidence, while risk factors tend to decrease it. Confidence varies significantly across sectors, with the Technology sector showing the greatest variability. When models are prompted for specific financial categories, their confidence rankings best align with fundamental data, moderately with technical signals, and least with growth indicators. These results highlight representation bias in Qwen models and motivate sector-aware calibration and category-conditioned evaluation protocols for safe and fair financial LLM deployment.

Problem

Research questions and friction points this paper is trying to address.

Examining firm size and sector representation bias in financial LLMs

Proposing balanced prompting to measure investment decision confidence

Identifying valuation-driven confidence bias versus risk-driven skepticism patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Balanced round-robin prompting method for equity analysis

Constrained decoding with token-logit aggregation techniques

Sector-aware calibration and category-conditioned evaluation protocols

🔎 Similar Papers

Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations