๐ค AI Summary
This work addresses the challenge of multi-step, goal-directed trading decision-making in interactive financial markets. We propose a novel agent architecture that synergistically integrates large language models (LLMs) with gradient-driven reinforcement learning. Methodologically, we introduce the first approach to directly employ a partially fine-tuned LLM as a differentiable policy network, optimized end-to-end via REINFORCE policy gradients guided by trading rewardsโthereby jointly enhancing linguistic understanding and financial decision-making capabilities. Our framework incorporates parameter-efficient fine-tuning, multimodal financial data representation, and an LLM-Agent design. Empirical evaluation on live-market simulation trading demonstrates statistically significant improvements in Sharpe ratio and win rate. Moreover, the architecture exhibits strong cross-task generalization across diverse downstream financial tasks, including portfolio optimization, event-driven trading, and risk forecasting.
๐ Abstract
Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose extsc{FLAG-Trader}, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.