From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study addresses the challenge of insufficient algorithmic reproducibility in financial AI systems—stemming from hardware- and architecture-induced mechanical non-determinism—which undermines regulatory auditability in critical applications such as credit risk assessment, fraud detection, and anti-money laundering. The work analyzes failure mechanisms of reproducibility across three dominant modalities: tabular models, graph neural networks, and large language model agents. It establishes, for the first time, a linkage between modality-specific metrics (RBO, D_cos, TDI, PSD) and audit readiness, proposing a hierarchical evaluation framework oriented toward audit preparedness. Empirical experiments on public financial datasets quantitatively elucidate issues including ranking instability, prediction flip rates, and semantic output divergence, demonstrating the complementary roles of logit-level and semantic-level determinism measures. This advances financial AI from an accuracy-centric paradigm toward one grounded in auditability.

📝 Abstract

Deploying machine learning in regulated financial environments -- credit risk, fraud detection, and anti-money laundering -- exposes critical vulnerabilities in algorithmic reproducibility. While early financial ML addressed statistical challenges such as backtest overfitting, deep neural networks and Generative AI have introduced mechanical nondeterminism rooted in hardware and architecture. This survey provides a systems perspective on reproducibility failures across three modalities now dominant in financial AI: tabular models (post-hoc explanation variance), graph networks (stochastic sampling and temporal asynchrony), and LLM-based agentic workflows (batch-dependent divergence and trajectory drift). We supplement the literature analysis with first-party experiments on public financial datasets -- quantifying explanation rank instability in credit scoring, prediction flip rates in GNN-based fraud detection, and tensor-parallel-induced output divergence in LLM entity extraction. We propose a layered evaluation framework linking modality-specific metrics (RBO, D_cos, TDI, PSD) to audit readiness, and empirically validate the complementarity of logit-level and semantic-level determinism measures.

Problem

Research questions and friction points this paper is trying to address.

reproducibility

nondeterminism

financial AI

auditability

algorithmic accountability

Innovation

Methods, ideas, or system contributions that make the work stand out.

determinism

auditability

reproducibility