🤖 AI Summary
This work addresses a critical gap in current AI governance paradigms, which focus predominantly on output outcomes while neglecting the verifiability of reasoning processes. To remedy this, the paper introduces a novel “AI Integrity” framework centered on an Authority Stack comprising four layers—normative, cognitive, provenance, and data authorities—and formally defines the threat of Integrity Hallucination. Emphasizing auditability of reasoning over predefined value alignment, the authors integrate Schwartz’s theory of human values, Walton’s argumentation schemes, GRADE/CEBM evidence hierarchies, and source credibility models to develop the PRISM framework. This framework operationalizes reasoning integrity through six core metrics, enabling transparent, end-to-end verification of the inferential pathway from evidence to conclusion.
📝 Abstract
AI systems increasingly shape high-stakes decisions in healthcare, law, defense, and education, yet existing governance paradigms -- AI Ethics, AI Safety, and AI Alignment -- share a common limitation: they evaluate outcomes rather than verifying the reasoning process itself. This paper introduces AI Integrity, a concept defined as a state in which the Authority Stack of an AI system -- its layered hierarchy of values, epistemological standards, source preferences, and data selection criteria -- is protected from corruption, contamination, manipulation, and bias, and maintained in a verifiable manner. We distinguish AI Integrity from the three existing paradigms, define the Authority Stack as a 4-layer cascade model (Normative, Epistemic, Source, and Data Authority) grounded in established academic frameworks -- Schwartz Basic Human Values for normative authority, Walton argumentation schemes with GRADE/CEBM hierarchies for epistemic authority, and Source Credibility Theory for source authority -- characterize the distinction between legitimate cascading and Authority Pollution, and identify Integrity Hallucination as the central measurable threat to value consistency. We further specify the PRISM (Profile-based Reasoning Integrity Stack Measurement) framework as the operational methodology, defining six core metrics and a phased research roadmap. Unlike normative frameworks that prescribe which values are correct, AI Integrity is a procedural concept: it requires that the path from evidence to conclusion be transparent and auditable, regardless of which values a system holds.