SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the challenge of strategy drift in large language model (LLM)-driven financial trading under low signal-to-noise ratios and delayed rewards, where unconstrained prompt optimization often conflates logical flaws with market noise. To mitigate this, the authors propose a neuro-symbolic framework that confines trading strategies to human-readable, conditional-action rule gauges. The approach enables atomic strategy revisions through cross-sample attribution analysis and incorporates forward validation for regularization. By integrating structured, auditable symbolic rules into LLM-based autonomous trading systems—a first in this domain—the method significantly enhances robustness across three stock sectors and four LLM backbones. Notably, it boosts average returns by 10–20 percentage points for smaller models while preserving both interpretability and self-evolution capabilities.

📝 Abstract

Large language models (LLMs) are increasingly deployed for autonomous financial trading, a domain requiring continuous adaptation to noisy, non-stationary markets. Existing self-improving agents typically address this through unbounded free-form prompt optimization. However, in low signal-to-noise environments with delayed scalar rewards (P\&L), this unstructured approach exacerbates the fundamental credit assignment problem: optimizers cannot reliably distinguish systematic logic flaws from stochastic market variance, inevitably leading to policy drift. To overcome this bottleneck, we introduce the Self-Evolving Human-Auditable Rubric Policy (SHARP), a neuro-symbolic framework that replaces unconstrained text mutation with structured, symbolic policy optimization. SHARP confines the agent's reasoning to a bounded, human-readable rubric of explicit condition-action rules. When sub-optimal trades occur, an attribution agent employs cross-sample reasoning across multiple samples to isolate specific rule failures. This enables targeted, atomic policy edits that are subsequently regularized through strict walk-forward validation. Evaluated across three diverse equity sectors and four LLM backbones, SHARP consistently transforms generic initial heuristics into highly robust strategies, lifting the empirical performance of compact models by 10 to 20 percentage points on average (e.g., GPT-4o-mini). Ultimately, SHARP demonstrates that LLMs can achieve dynamic and efficient adaptation while significantly enhancing the structural transparency and auditability demanded by institutional finance.

Problem

Research questions and friction points this paper is trying to address.

credit assignment problem

policy drift

financial trading agents

non-stationary markets

low signal-to-noise ratio

Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic

structured policy optimization

human-auditable rubric