Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

📅 2024-12-24

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing LLM evaluation frameworks prioritize capability assessment while neglecting safety, and unidimensional weighted rankings lack interpretability. To address these limitations, this paper proposes the first dynamic leaderboard framework that jointly optimizes for both capability and safety. Methodologically, it integrates an interactive LLM arena with a novel multi-objective optimization algorithm—“Distance-to-Optimal-Score”—to achieve interpretable, balanced trade-offs between capability and safety metrics. The framework incorporates comprehensive safety evaluations (e.g., jailbreaking, bias, hallucination) and a dynamic leaderboard mechanism that adapts to evolving model releases. In its inaugural benchmarking round, 26 mainstream models were assessed, uncovering pervasive critical safety vulnerabilities—even among state-of-the-art models. The resulting framework delivers a reproducible, extensible, and responsible AI benchmarking suite, establishing a new paradigm for dual-objective LLM evaluation.

Technology Category

Application Category

📝 Abstract

To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Capability and Safety Evaluation

Responsible AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Libra-Leaderboard

Comprehensive Evaluation

Responsible AI

🔎 Similar Papers

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?