๐ค AI Summary
This study addresses the lack of fine-grained, dynamic evaluation mechanisms for AI systems in the post-deployment phase. We propose the Aggregated Individual Reporting (AIR) framework, the first to systematically incorporate qualitative, user-generated feedback from real-world interactions into AI evaluation. AIR employs structured reporting collection, temporal aggregation, and thematic analysis to enable early detection of performance degradation and emergent safety risks. Grounded in democratic AI principles, it specifies an actionable reporting interface, a scalable data aggregation pipeline, and a responsive decision-making pathwayโthereby bridging critical gaps in user-centricity and real-time governance within existing evaluation paradigms. Empirical validation demonstrates that individual reports effectively surface previously unanticipated safety issues in black-box models and facilitate targeted interventions. We further formalize a standardized AIR workflow and outline future research directions toward cross-platform collaborative AI governance.
๐ Abstract
The need for developing model evaluations beyond static benchmarking, especially in the post-deployment phase, is now well-understood. At the same time, concerns about the concentration of power in deployed AI systems have sparked a keen interest in 'democratic' or 'public' AI. In this work, we bring these two ideas together by proposing mechanisms for aggregated individual reporting (AIR), a framework for post-deployment evaluation that relies on individual reports from the public. An AIR mechanism allows those who interact with a specific, deployed (AI) system to report when they feel that they may have experienced something problematic; these reports are then aggregated over time, with the goal of evaluating the relevant system in a fine-grained manner. This position paper argues that individual experiences should be understood as an integral part of post-deployment evaluation, and that the scope of our proposed aggregated individual reporting mechanism is a practical path to that end. On the one hand, individual reporting can identify substantively novel insights about safety and performance; on the other, aggregation can be uniquely useful for informing action. From a normative perspective, the post-deployment phase completes a missing piece in the conversation about 'democratic' AI. As a pathway to implementation, we provide a workflow of concrete design decisions and pointers to areas requiring further research and methodological development.