๐ค AI Summary
To address the severe scarcity of bilingual (ChineseโEnglish) labeled data for entity-level fine-grained sentiment analysis in finance, this paper introduces the largest publicly available bilingual entity-level financial sentiment dataset to date. We propose Self-Aware Iterative Learning with Contextual Correction (SILC), a two-stage framework that innovatively integrates pseudo-label-driven graph neural network (GNN)-based example retrieval with iterative correction. SILC synergistically combines large language model (LLM)-generated predictions and lightweight discriminative model refinement, enabling interpretable and traceable entity-level sentiment classification. On our newly constructed benchmark, SILC achieves state-of-the-art performance, significantly improving both accuracy and response latency in cryptocurrency sentiment monitoring. All datasets and source code are publicly released.
๐ Abstract
In recent years, fine-grained sentiment analysis in finance has gained significant attention, but the scarcity of entity-level datasets remains a key challenge. To address this, we have constructed the largest English and Chinese financial entity-level sentiment analysis datasets to date. Building on this foundation, we propose a novel two-stage sentiment analysis approach called Self-aware In-context Learning Correction (SILC). The first stage involves fine-tuning a base large language model to generate pseudo-labeled data specific to our task. In the second stage, we train a correction model using a GNN-based example retriever, which is informed by the pseudo-labeled data. This two-stage strategy has allowed us to achieve state-of-the-art performance on the newly constructed datasets, advancing the field of financial sentiment analysis. In a case study, we demonstrate the enhanced practical utility of our data and methods in monitoring the cryptocurrency market. Our datasets and code are available at https://github.com/NLP-Bin/SILC-EFSA.