๐ค AI Summary
This work addresses the challenge of strategy drift in large language model (LLM)-driven financial trading under low signal-to-noise ratios and delayed rewards, where unconstrained prompt optimization often conflates logical flaws with market noise. To mitigate this, the authors propose a neuro-symbolic framework that confines trading strategies to human-readable, conditional-action rule gauges. The approach enables atomic strategy revisions through cross-sample attribution analysis and incorporates forward validation for regularization. By integrating structured, auditable symbolic rules into LLM-based autonomous trading systemsโa first in this domainโthe method significantly enhances robustness across three stock sectors and four LLM backbones. Notably, it boosts average returns by 10โ20 percentage points for smaller models while preserving both interpretability and self-evolution capabilities.
๐ Abstract
Large language models (LLMs) are increasingly deployed for autonomous financial trading, a domain requiring continuous adaptation to noisy, non-stationary markets. Existing self-improving agents typically address this through unbounded free-form prompt optimization. However, in low signal-to-noise environments with delayed scalar rewards (P\&L), this unstructured approach exacerbates the fundamental credit assignment problem: optimizers cannot reliably distinguish systematic logic flaws from stochastic market variance, inevitably leading to policy drift. To overcome this bottleneck, we introduce the Self-Evolving Human-Auditable Rubric Policy (SHARP), a neuro-symbolic framework that replaces unconstrained text mutation with structured, symbolic policy optimization. SHARP confines the agent's reasoning to a bounded, human-readable rubric of explicit condition-action rules. When sub-optimal trades occur, an attribution agent employs cross-sample reasoning across multiple samples to isolate specific rule failures. This enables targeted, atomic policy edits that are subsequently regularized through strict walk-forward validation. Evaluated across three diverse equity sectors and four LLM backbones, SHARP consistently transforms generic initial heuristics into highly robust strategies, lifting the empirical performance of compact models by 10 to 20 percentage points on average (e.g., GPT-4o-mini). Ultimately, SHARP demonstrates that LLMs can achieve dynamic and efficient adaptation while significantly enhancing the structural transparency and auditability demanded by institutional finance.