🤖 AI Summary
This study addresses a critical limitation in existing AI governance models, which treat user trust as a one-time adoption decision and overlook its dynamic evolution through repeated interactions, thereby failing to ensure system safety. The work innovatively conceptualizes trust as users’ reduced monitoring behavior within asymmetric, repeated interactions with developers and, drawing on evolutionary game theory, reveals for the first time the co-evolutionary dynamics between trust and security strategies. Through analyses of infinite-population replicator dynamics, finite-population stochastic evolution, and Q-learning simulations, three long-term equilibria emerge: no adoption with insecurity, widespread adoption yet insecurity, and widespread adoption with security—the latter achievable only when penalties are sufficiently stringent and users can sustain low-cost monitoring. The findings underscore that effective governance requires the joint design of affordable monitoring mechanisms and credible punishment schemes, rather than reliance on regulation alone or uncritical trust.
📝 Abstract
AI safety is an increasingly urgent concern as the capabilities and adoption of AI systems grow. Existing evolutionary models of AI governance have primarily examined incentives for safe development and effective regulation, typically representing users' trust as a one-shot adoption choice rather than as a dynamic, evolving process shaped by repeated interactions. We instead model trust as reduced monitoring in a repeated, asymmetric interaction between users and AI developers, where checking AI behaviour is costly. Using evolutionary game theory, we study how user trust strategies and developer choices between safe (compliant) and unsafe (non-compliant) AI co-evolve under different levels of monitoring cost and institutional regimes. We complement the infinite-population replicator analysis with stochastic finite-population dynamics and reinforcement learning (Q-learning) simulations. Across these approaches, we find three robust long-run regimes: no adoption with unsafe development, unsafe but widely adopted systems, and safe systems that are widely adopted. Only the last is desirable, and it arises when penalties for unsafe behaviour exceed the extra cost of safety and users can still afford to monitor at least occasionally. Our results formally support governance proposals that emphasise transparency, low-cost monitoring, and meaningful sanctions, and they show that neither regulation alone nor blind user trust is sufficient to prevent evolutionary drift towards unsafe or low-adoption outcomes.