Referential Security as a New Paradigm for AI Evaluations

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Current AI safety evaluations rely on static identifiers, which struggle to address undeclared component changes in continuously updated systems, leading to ambiguous evaluation targets and non-reproducible results. This work proposes a novel paradigm—“referential safety”—that defines model identity through empirically verifiable attributes and, for the first time, decouples referential stability from safety claims, enabling unambiguous identification and validation of model instances. By constructing an evaluation framework grounded in verifiable artifacts—such as weights, prompts, and retrieval mechanisms—the approach establishes stable reference for dynamic AI systems, substantially enhancing the empirical validity, reproducibility, longitudinal auditability, and cross-provider equivalence of safety assessments throughout the model lifecycle.

📝 Abstract

Security evaluations inherently depend on stable identifiers. Any finding, audit, or regulatory decision must remain attached to the specific artifact it pertains to. Continuously updated artificial intelligence systems violate this core assumption, with public model designations remaining static while underlying weights, prompts, retrieval mechanisms, misuse classifiers, inference settings, and serving infrastructures undergo unannounced modifications. Consequently, current evaluations frequently apply to superficial labels rather than identifiable and distinct systems. To resolve this, we propose referential security as a new paradigm for AI evaluation. The fundamental security question extends beyond whether a model is safe to whether subsequent parties can conclusively determine which system a specific safety claim addressed. This approach reframes model identity as an empirically verifiable property and separates referential stability from the substantive security claims it conditions. This framework brings tractability to three critical workflows that current practices handle poorly. Specifically, it enables reproducible evaluation, longitudinal audit validity, and cross-provider equivalence. By grounding these evaluations in verifiable artifacts, our approach ensures that safety audits and regulatory findings maintain their empirical utility across the operational lifecycle of dynamic systems.

Problem

Research questions and friction points this paper is trying to address.

referential security

AI evaluation

model identity

audit validity

dynamic AI systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

referential security

AI evaluation

model identity