🤖 AI Summary
Existing evaluation paradigms struggle to effectively assess large language models’ capacity for macro and sector-level asset allocation in dynamic financial environments, and lack reproducible benchmarks grounded in real-world market attention. To address this gap, this work proposes CN-Buzz2Portfolio—the first public-attention-driven macro asset allocation benchmark tailored to the Chinese market, which maps daily trending financial news into ETF portfolio decisions. The framework employs a three-stage Compress–Perceive–Allocate (CPA) agent workflow, integrating news popularity tracking, textual compression, macroeconomic awareness, and portfolio optimization within an end-to-end evaluation pipeline, while incorporating rolling time windows to simulate realistic information dynamics. Experiments across nine mainstream large language models demonstrate that the benchmark effectively differentiates models’ ability to translate macroeconomic narratives into portfolio weights. Code and data are publicly released.
📝 Abstract
Large Language Models (LLMs) are rapidly transitioning from static Natural Language Processing (NLP) tasks including sentiment analysis and event extraction to acting as dynamic decision-making agents in complex financial environments. However, the evolution of LLMs into autonomous financial agents faces a significant dilemma in evaluation paradigms. Direct live trading is irreproducible and prone to outcome bias by confounding luck with skill, whereas existing static benchmarks are often confined to entity-level stock picking and ignore broader market attention. To facilitate the rigorous analysis of these challenges, we introduce CN-Buzz2Portfolio, a reproducible benchmark grounded in the Chinese market that maps daily trending news to macro and sector asset allocation. Spanning a rolling horizon from 2024 to mid-2025, our dataset simulates a realistic public attention stream, requiring agents to distill investment logic from high-exposure narratives instead of pre-filtered entity news. We propose a Tri-Stage CPA Agent Workflow involving Compression, Perception, and Allocation to evaluate LLMs on broad asset classes such as Exchange Traded Funds (ETFs) rather than individual stocks, thereby reducing idiosyncratic volatility. Extensive experiments on nine LLMs reveal significant disparities in how models translate macro-level narratives into portfolio weights. This work provides new insights into the alignment between general reasoning and financial decision-making, and all data, codes, and experiments are released to promote sustainable financial agent research.