🤖 AI Summary
To address overfitting and poor generalization in supervised fine-tuning (SFT) of large language models (LLMs) for financial sentiment analysis, this paper proposes FinDPO—a novel framework that introduces Direct Preference Optimization (DPO) to the domain for the first time. FinDPO leverages a causal language model initialized via SFT and aligns outputs with human preferences to enable fine-grained sentiment modeling. Its key innovation is a “logit-to-score” transformation mechanism that produces sortable, continuous sentiment scores—directly enabling quantitative trading strategy construction. On standard benchmarks, FinDPO achieves an average accuracy 11% higher than state-of-the-art methods. Empirical backtesting yields an annualized return of 67% and a Sharpe ratio of 2.0, remaining robust even under a 5-basis-point transaction cost. The core contribution lies in adapting preference learning to financial semantics, markedly enhancing model generalization to unseen market events and domain-specific terminology.
📝 Abstract
Opinions expressed in online finance-related textual data are having an increasingly profound impact on trading decisions and market movements. This trend highlights the vital role of sentiment analysis as a tool for quantifying the nature and strength of such opinions. With the rapid development of Generative AI (GenAI), supervised fine-tuned (SFT) large language models (LLMs) have become the de facto standard for financial sentiment analysis. However, the SFT paradigm can lead to memorization of the training data and often fails to generalize to unseen samples. This is a critical limitation in financial domains, where models must adapt to previously unobserved events and the nuanced, domain-specific language of finance. To this end, we introduce FinDPO, the first finance-specific LLM framework based on post-training human preference alignment via Direct Preference Optimization (DPO). The proposed FinDPO achieves state-of-the-art performance on standard sentiment classification benchmarks, outperforming existing supervised fine-tuned models by 11% on the average. Uniquely, the FinDPO framework enables the integration of a fine-tuned causal LLM into realistic portfolio strategies through a novel 'logit-to-score' conversion, which transforms discrete sentiment predictions into continuous, rankable sentiment scores (probabilities). In this way, simulations demonstrate that FinDPO is the first sentiment-based approach to maintain substantial positive returns of 67% annually and strong risk-adjusted performance, as indicated by a Sharpe ratio of 2.0, even under realistic transaction costs of 5 basis points (bps).