Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies a significant coverage gap and collaboration deficit between first-party (developer-led) and third-party (academic, NGO, etc.) AI social impact assessments—spanning bias, fairness, privacy, environmental cost, and labor practices. Methodologically, it conducts the first systematic comparative analysis of 186 first-party reports and 183 third-party evaluations, integrating content analysis, quantitative statistics, and in-depth interviews with AI developers. Results reveal persistent under-disclosure by first parties on critical issues, while third-party assessments—though deeper—remain constrained by data opacity and lack of access to proprietary infrastructure and internal documentation. The core contribution is the empirical identification of structural imbalances within the AI assessment ecosystem. The paper proposes a shared infrastructure framework to integrate independent evaluations, enhance verifiability, and strengthen accountability—thereby establishing a methodological foundation and actionable policy pathway for robust AI governance.

Technology Category

Application Category

📝 Abstract
Foundation models are increasingly central to high-stakes AI systems, and governance frameworks now depend on evaluations to assess their risks and capabilities. Although general capability evaluations are widespread, social impact assessments covering bias, fairness, privacy, environmental costs, and labor practices remain uneven across the AI ecosystem. To characterize this landscape, we conduct the first comprehensive analysis of both first-party and third-party social impact evaluation reporting across a wide range of model developers. Our study examines 186 first-party release reports and 183 post-release evaluation sources, and complements this quantitative analysis with interviews of model developers. We find a clear division of evaluation labor: first-party reporting is sparse, often superficial, and has declined over time in key areas such as environmental impact and bias, while third-party evaluators including academic researchers, nonprofits, and independent organizations provide broader and more rigorous coverage of bias, harmful content, and performance disparities. However, this complementarity has limits. Only model developers can authoritatively report on data provenance, content moderation labor, financial costs, and training infrastructure, yet interviews reveal that these disclosures are often deprioritized unless tied to product adoption or regulatory compliance. Our findings indicate that current evaluation practices leave major gaps in assessing AI's societal impacts, highlighting the urgent need for policies that promote developer transparency, strengthen independent evaluation ecosystems, and create shared infrastructure to aggregate and compare third-party evaluations in a consistent and accessible way.
Problem

Research questions and friction points this paper is trying to address.

Mapping gaps between first-party and third-party AI social impact evaluations
Analyzing sparse developer reporting on bias, environment, and labor practices
Identifying need for policies promoting transparency and independent evaluation ecosystems
Innovation

Methods, ideas, or system contributions that make the work stand out.

First comprehensive analysis of evaluation reporting
Interviews with model developers complement quantitative analysis
Policies promoting transparency and independent evaluation ecosystems
🔎 Similar Papers
No similar papers found.
Anka Reuel
Anka Reuel
CS Ph.D. Candidate, Stanford University
AI GovernanceResponsible AIAI EthicsAI Safety
A
Avijit Ghosh
Hugging Face
Jenny Chim
Jenny Chim
Queen Mary University of London
natural language processingcomputational linguistics
Andrew Tran
Andrew Tran
Temple University
Generative AIHuman-computer Interaction
Yanan Long
Yanan Long
University of Chicago
AI for ScienceBayesian StatisticsGeometric Deep LearningNatural Language ProcessingAI Ethics
Jennifer Mickel
Jennifer Mickel
UT Austin
Usman Gohar
Usman Gohar
Iowa State University
machine learningartificial intelligencefairness in machine learningsoftware engineering
S
Srishti Yadav
University of Copenhagen
Pawan Sasanka Ammanamanchi
Pawan Sasanka Ammanamanchi
IIIT Hyderabad
Natural Language ProcessingDeep Learning
Mowafak Allaham
Mowafak Allaham
Northwestern University
Hossein A. Rahmani
Hossein A. Rahmani
PhD Student, University College London
Natural Language ProcessingInformation RetrievalMachine Learning
Mubashara Akhtar
Mubashara Akhtar
ETH AI Center fellow at ETH Zurich
NLPMultimodalityBenchmarking & Evaluation
Felix Friedrich
Felix Friedrich
postdoc @ Meta FAIR, Montreal
Multimodal AIGenerative AIAI AlignmentAI Safety
R
Robert Scholz
Max Planck School of Cognition
M
M. A. Riegler
Simula
J
Jan Batzner
Weizenbaum Institute, Technical University Munich
Eliya Habba
Eliya Habba
Hebrew University of Jerusalem
A
Arushi Saxena
Integrity Institute
A
Anastassia Kornilova
Trustible
Kevin L. Wei
Kevin L. Wei
RAND; Harvard Law School
AI evaluationAI safetyAI governanceprivate lawempirical legal studies
P
Prajna Soni
Alinia AI
Y
Yohan Mathew
Independent
Kevin Klyman
Kevin Klyman
Stanford, Harvard
Foundation ModelsAI RegulationGeopolitics
J
Jeba Sania
Harvard University
S
Subramanyam Sahoo
Berkeley AI Safety Initiative (BASIS)
O
O. Bruvik
Stanford University
Pouya Sadeghi
Pouya Sadeghi
Computer Science student, University of Waterloo
S
Sujata Goswami
Berkeley National Laboratory
Angelina Wang
Angelina Wang
Cornell Tech
machine learning fairnessevaluation and measurement
Yacine Jernite
Yacine Jernite
Research Scientist, HuggingFace
Machine LearningNatural Language Processing
Zeerak Talat
Zeerak Talat
University of Edinburgh
NLPOnline AbuseHate SpeechSTSMedia Studies
Stella Biderman
Stella Biderman
EleutherAI
Natural Language ProcessingArtificial IntelligenceLanguage ModelingDeep Learning
Mykel J. Kochenderfer
Mykel J. Kochenderfer
Associate Professor, Stanford University
Artificial IntelligenceMachine LearningDecision TheorySafety
Sanmi Koyejo
Sanmi Koyejo
Assistant Professor, Stanford University
Machine LearningHealthcare AINeuroinformatics
Irene Solaiman
Irene Solaiman
Hugging Face
artificial intelligence