🤖 AI Summary
This study addresses the challenge of distinguishing genuine model novelty from functional redundancy in heterogeneous AI ecosystems—a critical barrier to trustworthy AI governance. The authors propose a statistical framework grounded in In-Silico Quasi-Experimental Design (ISQED) to identify a model’s intrinsic identity through matched interventions and introduce the Peer-Inexpressible Residual (PIER) to quantify model uniqueness. They formally prove that observational data alone cannot identify model uniqueness without intervention, derive sample-efficient active auditing scaling laws that are provably optimal, and expose fundamental limitations of cooperative game-theoretic approaches like Shapley values in detecting redundancy. Integrating adaptive query protocols, the DISCO estimator, and minimax-optimal sampling theory, the framework enables high-precision auditability of model substitutability across diverse domains, including computer vision, large language models, and urban traffic forecasting.
📝 Abstract
As AI systems evolve from isolated predictors into complex, heterogeneous ecosystems of foundation models and specialized adapters, distinguishing genuine behavioral novelty from functional redundancy becomes a critical governance challenge. Here, we introduce a statistical framework for auditing model uniqueness based on In-Silico Quasi-Experimental Design (ISQED). By enforcing matched interventions across models, we isolate intrinsic model identity and quantify uniqueness as the Peer-Inexpressible Residual (PIER), i.e. the component of a target's behavior strictly irreducible to any stochastic convex combination of its peers, with vanishing PIER characterizing when such a routing-based substitution becomes possible. We establish the theoretical foundations of ecosystem auditing through three key contributions. First, we prove a fundamental limitation of observational logs: uniqueness is mathematically non-identifiable without intervention control. Second, we derive a scaling law for active auditing, showing that our adaptive query protocol achieves minimax-optimal sample efficiency ($d\sigma^2\gamma^{-2}\log(Nd/\delta)$). Third, we demonstrate that cooperative game-theoretic methods, such as Shapley values, fundamentally fail to detect redundancy. We implement this framework via the DISCO (Design-Integrated Synthetic Control) estimator and deploy it across diverse ecosystems, including computer vision models (ResNet/ConvNeXt/ViT), large language models (BERT/RoBERTa), and city-scale traffic forecasters. These results move trustworthy AI beyond explaining single models: they establish a principled, intervention-based science of auditing and governing heterogeneous model ecosystems.