PersonaLedger: Generating Realistic Financial Transactions with Persona Conditioned LLMs and Rule Grounded Feedback

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for synthesizing financial transactions struggle to simultaneously capture behavioral diversity and logical coherence, often relying on sensitive real-world data that is constrained by privacy regulations. This work proposes an innovative framework that integrates user-profile-conditioned large language models with a configurable rule engine to generate high-fidelity, privacy-preserving transaction streams in a closed-loop manner. The approach leverages state-aware prompt feedback and enforces hard financial constraints to ensure realism and compliance. To facilitate reproducible research in financial AI, we release a synthetic dataset comprising 23,000 users and 30 million transactions, along with two benchmark tasks: liquidity risk classification and identity theft segmentation.

Technology Category

Application Category

📝 Abstract
Strict privacy regulations limit access to real transaction data, slowing open research in financial AI. Synthetic data can bridge this gap, but existing generators do not jointly achieve behavioral diversity and logical groundedness. Rule-driven simulators rely on hand-crafted workflows and shallow stochasticity, which miss the richness of human behavior. Learning-based generators such as GANs capture correlations yet often violate hard financial constraints and still require training on private data. We introduce PersonaLedger, a generation engine that uses a large language model conditioned on rich user personas to produce diverse transaction streams, coupled with an expert configurable programmatic engine that maintains correctness. The LLM and engine interact in a closed loop: after each event, the engine updates the user state, enforces financial rules, and returns a context aware"nextprompt"that guides the LLM toward feasible next actions. With this engine, we create a public dataset of 30 million transactions from 23,000 users and a benchmark suite with two tasks, illiquidity classification and identity theft segmentation. PersonaLedger offers a realistic, privacy preserving resource that supports rigorous evaluation of forecasting and anomaly detection models. PersonaLedger offers the community a rich, realistic, and privacy preserving resource -- complete with code, rules, and generation logs -- to accelerate innovation in financial AI and enable rigorous, reproducible evaluation.
Problem

Research questions and friction points this paper is trying to address.

synthetic financial data
privacy-preserving generation
behavioral diversity
logical groundedness
financial AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Persona-Conditioned LLM
Rule-Grounded Feedback
Synthetic Financial Transactions
Closed-Loop Generation
Privacy-Preserving Benchmark
🔎 Similar Papers
No similar papers found.