A Different Approach to AI Safety: Proceedings from the Columbia Convening on Openness in Artificial Intelligence and AI Safety

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

This project addresses three critical challenges in AI safety posed by open-weight and open-source foundation models: (1) the absence of multimodal and multilingual evaluation benchmarks; (2) weak defenses in agent systems against prompt injection and compositional attacks; and (3) insufficient participation from communities most adversely affected by AI harms. Employing participatory workshops, the methodology integrates technical intervention mapping, open-source tool assessment, multimodal benchmark construction, robust prompt injection mitigation, and agent security framework design. The project innovatively advances the “openness-for-safety” paradigm, establishing the first globally coordinated, open-source AI safety governance pathway. Key contributions include: the inaugural Content Safety Filter Ecosystem Map; an Open Deployment Tools Atlas; a cross-cultural participation mechanism; and a Safety–Openness Interdisciplinary Research Roadmap. These outputs have directly informed policy formulation for the 2025 French AI Action Summit.

Technology Category

Application Category

📝 Abstract

The rapid rise of open-weight and open-source foundation models is intensifying the obligation and reshaping the opportunity to make AI systems safe. This paper reports outcomes from the Columbia Convening on AI Openness and Safety (San Francisco, 19 Nov 2024) and its six-week preparatory programme involving more than forty-five researchers, engineers, and policy leaders from academia, industry, civil society, and government. Using a participatory, solutions-oriented process, the working groups produced (i) a research agenda at the intersection of safety and open source AI; (ii) a mapping of existing and needed technical interventions and open source tools to safely and responsibly deploy open foundation models across the AI development workflow; and (iii) a mapping of the content safety filter ecosystem with a proposed roadmap for future research and development. We find that openness -- understood as transparent weights, interoperable tooling, and public governance -- can enhance safety by enabling independent scrutiny, decentralized mitigation, and culturally plural oversight. However, significant gaps persist: scarce multimodal and multilingual benchmarks, limited defenses against prompt-injection and compositional attacks in agentic systems, and insufficient participatory mechanisms for communities most affected by AI harms. The paper concludes with a roadmap of five priority research directions, emphasizing participatory inputs, future-proof content filters, ecosystem-wide safety infrastructure, rigorous agentic safeguards, and expanded harm taxonomies. These recommendations informed the February 2025 French AI Action Summit and lay groundwork for an open, plural, and accountable AI safety discipline.

Problem

Research questions and friction points this paper is trying to address.

Addressing AI safety through openness and transparency in models

Identifying gaps in benchmarks and defenses for AI systems

Developing participatory mechanisms for AI harm mitigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Participatory solutions-oriented process for AI safety

Mapping technical interventions for open foundation models

Roadmap for future content safety filter research

🔎 Similar Papers

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?