🤖 AI Summary
This project addresses three critical challenges in AI safety posed by open-weight and open-source foundation models: (1) the absence of multimodal and multilingual evaluation benchmarks; (2) weak defenses in agent systems against prompt injection and compositional attacks; and (3) insufficient participation from communities most adversely affected by AI harms. Employing participatory workshops, the methodology integrates technical intervention mapping, open-source tool assessment, multimodal benchmark construction, robust prompt injection mitigation, and agent security framework design. The project innovatively advances the “openness-for-safety” paradigm, establishing the first globally coordinated, open-source AI safety governance pathway. Key contributions include: the inaugural Content Safety Filter Ecosystem Map; an Open Deployment Tools Atlas; a cross-cultural participation mechanism; and a Safety–Openness Interdisciplinary Research Roadmap. These outputs have directly informed policy formulation for the 2025 French AI Action Summit.
📝 Abstract
The rapid rise of open-weight and open-source foundation models is intensifying the obligation and reshaping the opportunity to make AI systems safe. This paper reports outcomes from the Columbia Convening on AI Openness and Safety (San Francisco, 19 Nov 2024) and its six-week preparatory programme involving more than forty-five researchers, engineers, and policy leaders from academia, industry, civil society, and government. Using a participatory, solutions-oriented process, the working groups produced (i) a research agenda at the intersection of safety and open source AI; (ii) a mapping of existing and needed technical interventions and open source tools to safely and responsibly deploy open foundation models across the AI development workflow; and (iii) a mapping of the content safety filter ecosystem with a proposed roadmap for future research and development. We find that openness -- understood as transparent weights, interoperable tooling, and public governance -- can enhance safety by enabling independent scrutiny, decentralized mitigation, and culturally plural oversight. However, significant gaps persist: scarce multimodal and multilingual benchmarks, limited defenses against prompt-injection and compositional attacks in agentic systems, and insufficient participatory mechanisms for communities most affected by AI harms. The paper concludes with a roadmap of five priority research directions, emphasizing participatory inputs, future-proof content filters, ecosystem-wide safety infrastructure, rigorous agentic safeguards, and expanded harm taxonomies. These recommendations informed the February 2025 French AI Action Summit and lay groundwork for an open, plural, and accountable AI safety discipline.