PersonaLedger: Generating Realistic Financial Transactions with Persona Conditioned LLMs and Rule Grounded Feedback

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing methods for synthesizing financial transactions struggle to simultaneously capture behavioral diversity and logical coherence, often relying on sensitive real-world data that is constrained by privacy regulations. This work proposes an innovative framework that integrates user-profile-conditioned large language models with a configurable rule engine to generate high-fidelity, privacy-preserving transaction streams in a closed-loop manner. The approach leverages state-aware prompt feedback and enforces hard financial constraints to ensure realism and compliance. To facilitate reproducible research in financial AI, we release a synthetic dataset comprising 23,000 users and 30 million transactions, along with two benchmark tasks: liquidity risk classification and identity theft segmentation.

Technology Category

Application Category

📝 Abstract

Strict privacy regulations limit access to real transaction data, slowing open research in financial AI. Synthetic data can bridge this gap, but existing generators do not jointly achieve behavioral diversity and logical groundedness. Rule-driven simulators rely on hand-crafted workflows and shallow stochasticity, which miss the richness of human behavior. Learning-based generators such as GANs capture correlations yet often violate hard financial constraints and still require training on private data. We introduce PersonaLedger, a generation engine that uses a large language model conditioned on rich user personas to produce diverse transaction streams, coupled with an expert configurable programmatic engine that maintains correctness. The LLM and engine interact in a closed loop: after each event, the engine updates the user state, enforces financial rules, and returns a context aware"nextprompt"that guides the LLM toward feasible next actions. With this engine, we create a public dataset of 30 million transactions from 23,000 users and a benchmark suite with two tasks, illiquidity classification and identity theft segmentation. PersonaLedger offers a realistic, privacy preserving resource that supports rigorous evaluation of forecasting and anomaly detection models. PersonaLedger offers the community a rich, realistic, and privacy preserving resource -- complete with code, rules, and generation logs -- to accelerate innovation in financial AI and enable rigorous, reproducible evaluation.

Problem

Research questions and friction points this paper is trying to address.

synthetic financial data

privacy-preserving generation

behavioral diversity

logical groundedness

financial AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Persona-Conditioned LLM

Rule-Grounded Feedback

Synthetic Financial Transactions