🤖 AI Summary
This study addresses sentence-level stance detection toward three financial targets—debt, earnings per share (EPS), and sales—in financial texts. To overcome the scarcity of labeled data, we propose a large language model (LLM)-driven approach that requires no large-scale manual annotation. We construct the first fine-grained stance detection corpus specifically for these three key financial metrics, with initial labels generated by ChatGPT-o3-pro and rigorously validated by domain experts. We systematically evaluate zero-shot, few-shot, and chain-of-thought (CoT) prompting strategies. Results show that few-shot + CoT prompting significantly outperforms supervised baselines and demonstrates strong generalization across two distinct financial text genres: SEC annual reports and earnings call transcripts. Moreover, we identify notable genre-specific effects on stance classification performance. This work constitutes the first empirical validation of LLMs’ effectiveness and practicality for low-resource financial stance analysis, establishing a novel paradigm for fine-grained, interpretable financial semantic analysis.
📝 Abstract
Financial narratives from U.S. Securities and Exchange Commission (SEC) filing reports and quarterly earnings call transcripts (ECTs) are very important for investors, auditors, and regulators. However, their length, financial jargon, and nuanced language make fine-grained analysis difficult. Prior sentiment analysis in the financial domain required a large, expensive labeled dataset, making the sentence-level stance towards specific financial targets challenging. In this work, we introduce a sentence-level corpus for stance detection focused on three core financial metrics: debt, earnings per share (EPS), and sales. The sentences were extracted from Form 10-K annual reports and ECTs, and labeled for stance (positive, negative, neutral) using the advanced ChatGPT-o3-pro model under rigorous human validation. Using this corpus, we conduct a systematic evaluation of modern large language models (LLMs) using zero-shot, few-shot, and Chain-of-Thought (CoT) prompting strategies. Our results show that few-shot with CoT prompting performs best compared to supervised baselines, and LLMs' performance varies across the SEC and ECT datasets. Our findings highlight the practical viability of leveraging LLMs for target-specific stance in the financial domain without requiring extensive labeled data.