A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This project addresses three critical challenges in AI safety posed by open-weight and open-source foundation models: (1) the absence of multimodal and multilingual evaluation benchmarks; (2) weak defenses in agent systems against prompt injection and compositional attacks; and (3) insufficient participation from communities most adversely affected by AI harms. Employing participatory workshops, the methodology integrates technical intervention mapping, open-source tool assessment, multimodal benchmark construction, robust prompt injection mitigation, and agent security framework design. The project innovatively advances the “openness-for-safety” paradigm, establishing the first globally coordinated, open-source AI safety governance pathway. Key contributions include: the inaugural Content Safety Filter Ecosystem Map; an Open Deployment Tools Atlas; a cross-cultural participation mechanism; and a Safety–Openness Interdisciplinary Research Roadmap. These outputs have directly informed policy formulation for the 2025 French AI Action Summit.

Technology Category

Application Category

📝 Abstract
The rapid rise of open-weight and open-source foundation models is intensifying the obligation and reshaping the opportunity to make AI systems safe. This paper reports outcomes from the Columbia Convening on AI Openness and Safety (San Francisco, 19 Nov 2024) and its six-week preparatory programme involving more than forty-five researchers, engineers, and policy leaders from academia, industry, civil society, and government. Using a participatory, solutions-oriented process, the working groups produced (i) a research agenda at the intersection of safety and open source AI; (ii) a mapping of existing and needed technical interventions and open source tools to safely and responsibly deploy open foundation models across the AI development workflow; and (iii) a mapping of the content safety filter ecosystem with a proposed roadmap for future research and development. We find that openness -- understood as transparent weights, interoperable tooling, and public governance -- can enhance safety by enabling independent scrutiny, decentralized mitigation, and culturally plural oversight. However, significant gaps persist: scarce multimodal and multilingual benchmarks, limited defenses against prompt-injection and compositional attacks in agentic systems, and insufficient participatory mechanisms for communities most affected by AI harms. The paper concludes with a roadmap of five priority research directions, emphasizing participatory inputs, future-proof content filters, ecosystem-wide safety infrastructure, rigorous agentic safeguards, and expanded harm taxonomies. These recommendations informed the February 2025 French AI Action Summit and lay groundwork for an open, plural, and accountable AI safety discipline.
Problem

Research questions and friction points this paper is trying to address.

Addressing AI safety through openness and transparency in models
Identifying gaps in benchmarks and defenses for AI systems
Developing participatory mechanisms for AI harm mitigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Participatory solutions-oriented process for AI safety
Mapping technical interventions for open foundation models
Roadmap for future content safety filter research
C
Camille François
Columbia University
L
Ludovic Péran
Columbia University
A
Ayah Bdeir
Mozilla
Nouha Dziri
Nouha Dziri
Allen Institute for AI (Ai2)
Artificial IntelligenceNatural Language Processing
Will Hawkins
Will Hawkins
University of Bath
Sustainable ConstructionStructural OptimisationEmbodied Carbon
Yacine Jernite
Yacine Jernite
Research Scientist, HuggingFace
Machine LearningNatural Language Processing
Sayash Kapoor
Sayash Kapoor
CS PhD, Princeton University
ReproducibilityAI agentsSocietal impacts
J
Juliet Shen
Columbia University, ROOST
Heidy Khlaaf
Heidy Khlaaf
Chief AI Scientist, AI Now Institute
AI assuranceformal verificationmachine learningsystems engineeringsafety auditing
Kevin Klyman
Kevin Klyman
Stanford, Harvard
Foundation ModelsAI RegulationGeopolitics
N
Nik Marda
Mozilla
M
Marie Pellat
Mistral
D
Deb Raji
Mozilla
D
Divya Siddarth
Collective Intelligence Project
A
Aviya Skowron
EleutherAI
J
Joseph Spisak
Meta
M
Madhulika Srikumar
Partnership on AI
V
Victor Storchan
Mozilla
A
Audrey Tang
Taiwan Digital Affairs
J
Jen Weedon
Columbia University