INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent

📅 2024-12-24

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

A lack of unified, standardized evaluation benchmarks and datasets for financial decision-making hinders comparable and reliable assessment of LLM-based agents in multi-asset scenarios. Method: We introduce FinAgentBench—the first benchmark platform specifically designed for evaluating LLM agents in financial decision-making—covering diverse asset classes including equities, cryptocurrencies, and ETFs. It establishes a standardized evaluation framework comprising 12 core tasks (e.g., risk control, asset allocation, trend forecasting), integrates multimodal open-source data (textual, time-series, chart-based), provides reproducible simulation environments, and adopts a unified agent architecture. We instantiate agents from 13 state-of-the-art LLMs, enhanced with reinforcement learning and chain-of-thought reasoning for decision-making. Contribution/Results: FinAgentBench enables systematic, cross-model, cross-market, and cross-task evaluation. Experiments demonstrate significantly improved assessment consistency and reproducibility. All code and data are fully open-sourced.

Technology Category

Application Category

📝 Abstract

Recent advancements have underscored the potential of large language model (LLM)-based agents in financial decision-making. Despite this progress, the field currently encounters two main challenges: (1) the lack of a comprehensive LLM agent framework adaptable to a variety of financial tasks, and (2) the absence of standardized benchmarks and consistent datasets for assessing agent performance. To tackle these issues, we introduce extsc{InvestorBench}, the first benchmark specifically designed for evaluating LLM-based agents in diverse financial decision-making contexts. InvestorBench enhances the versatility of LLM-enabled agents by providing a comprehensive suite of tasks applicable to different financial products, including single equities like stocks, cryptocurrencies and exchange-traded funds (ETFs). Additionally, we assess the reasoning and decision-making capabilities of our agent framework using thirteen different LLMs as backbone models, across various market environments and tasks. Furthermore, we have curated a diverse collection of open-source, multi-modal datasets and developed a comprehensive suite of environments for financial decision-making. This establishes a highly accessible platform for evaluating financial agents' performance across various scenarios.

Problem

Research questions and friction points this paper is trying to address.

Financial Decision-Making

Large Language Models

Standardized Testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

InvestorBench

Financial Decision-Making

Language Model Evaluation

🔎 Similar Papers

No similar papers found.