FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address overfitting and poor generalization in supervised fine-tuning (SFT) of large language models (LLMs) for financial sentiment analysis, this paper proposes FinDPO—a novel framework that introduces Direct Preference Optimization (DPO) to the domain for the first time. FinDPO leverages a causal language model initialized via SFT and aligns outputs with human preferences to enable fine-grained sentiment modeling. Its key innovation is a “logit-to-score” transformation mechanism that produces sortable, continuous sentiment scores—directly enabling quantitative trading strategy construction. On standard benchmarks, FinDPO achieves an average accuracy 11% higher than state-of-the-art methods. Empirical backtesting yields an annualized return of 67% and a Sharpe ratio of 2.0, remaining robust even under a 5-basis-point transaction cost. The core contribution lies in adapting preference learning to financial semantics, markedly enhancing model generalization to unseen market events and domain-specific terminology.

Technology Category

Application Category

📝 Abstract
Opinions expressed in online finance-related textual data are having an increasingly profound impact on trading decisions and market movements. This trend highlights the vital role of sentiment analysis as a tool for quantifying the nature and strength of such opinions. With the rapid development of Generative AI (GenAI), supervised fine-tuned (SFT) large language models (LLMs) have become the de facto standard for financial sentiment analysis. However, the SFT paradigm can lead to memorization of the training data and often fails to generalize to unseen samples. This is a critical limitation in financial domains, where models must adapt to previously unobserved events and the nuanced, domain-specific language of finance. To this end, we introduce FinDPO, the first finance-specific LLM framework based on post-training human preference alignment via Direct Preference Optimization (DPO). The proposed FinDPO achieves state-of-the-art performance on standard sentiment classification benchmarks, outperforming existing supervised fine-tuned models by 11% on the average. Uniquely, the FinDPO framework enables the integration of a fine-tuned causal LLM into realistic portfolio strategies through a novel 'logit-to-score' conversion, which transforms discrete sentiment predictions into continuous, rankable sentiment scores (probabilities). In this way, simulations demonstrate that FinDPO is the first sentiment-based approach to maintain substantial positive returns of 67% annually and strong risk-adjusted performance, as indicated by a Sharpe ratio of 2.0, even under realistic transaction costs of 5 basis points (bps).
Problem

Research questions and friction points this paper is trying to address.

Improves financial sentiment analysis for trading decisions
Addresses generalization issues in supervised fine-tuned LLMs
Converts discrete sentiment into actionable trading scores
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Direct Preference Optimization for LLMs
Converts sentiment logits to rankable scores
Achieves 67% annual returns in simulations
🔎 Similar Papers
No similar papers found.