InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In behavioral finance research, aligning large language models (LLMs) with investor decision-making under herding behavior is hindered by the scarcity of real-world user data, impeding effective supervised fine-tuning (SFT). To address this, we propose InvestAlign—a novel paradigm that bypasses reliance on sensitive real-user data by generating high-quality, interpretable theoretical solutions as SFT supervision signals, grounded in optimal investment theory. This approach ensures privacy preservation while enhancing generalizability. The framework integrates theoretical modeling, SFT, and LLM agent techniques to instantiate InvestAgent, an intelligent investment agent. Experiments demonstrate that InvestAlign significantly accelerates model convergence and achieves superior fidelity in replicating real investor behavior—both in simple and complex investment tasks—outperforming prior data-driven methods. To our knowledge, this is the first work to realize theory-driven behavioral finance alignment for LLMs.

Technology Category

Application Category

📝 Abstract
Aligning Large Language Models (LLMs) with investor decision-making processes under herd behavior is a critical challenge in behavioral finance, which grapples with a fundamental limitation: the scarcity of real-user data needed for Supervised Fine-Tuning (SFT). While SFT can bridge the gap between LLM outputs and human behavioral patterns, its reliance on massive authentic data imposes substantial collection costs and privacy risks. We propose InvestAlign, a novel framework that constructs high-quality SFT datasets by leveraging theoretical solutions to similar and simple optimal investment problems rather than complex scenarios. Our theoretical analysis demonstrates that training LLMs with InvestAlign-generated data achieves faster parameter convergence than using real-user data, suggesting superior learning efficiency. Furthermore, we develop InvestAgent, an LLM agent fine-tuned with InvestAlign, which demonstrates significantly closer alignment to real-user data than pre-SFT models in both simple and complex investment problems. This highlights our proposed InvestAlign as a promising approach with the potential to address complex optimal investment problems and align LLMs with investor decision-making processes under herd behavior. Our code is publicly available at https://github.com/thu-social-network-research-group/InvestAlign.
Problem

Research questions and friction points this paper is trying to address.

Aligning LLMs with investor decisions under herd behavior
Overcoming data scarcity for supervised fine-tuning in finance
Generating synthetic data to replace real-user investment data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages theoretical solutions for SFT datasets
Achieves faster parameter convergence than real data
Fine-tunes LLM agent closer to real-user data
🔎 Similar Papers
No similar papers found.
H
Huisheng Wang
Department of Automation, Tsinghua University, Beijing, China
Zhuoshi Pan
Zhuoshi Pan
Tsinghua University
deep learningnatural language processing
H
Hangjing Zhang
Department of Automation, Tsinghua University, Beijing, China
M
Mingxiao Liu
Department of Automation, Tsinghua University, Beijing, China
H
Hanqing Gao
Department of Automation, Tsinghua University, Beijing, China
H. Vicky Zhao
H. Vicky Zhao
Tsinghua University, China
signal processingmultimediainformation forensics and securitysocial media