🤖 AI Summary
Existing LLM evaluation frameworks prioritize capability assessment while neglecting safety, and unidimensional weighted rankings lack interpretability. To address these limitations, this paper proposes the first dynamic leaderboard framework that jointly optimizes for both capability and safety. Methodologically, it integrates an interactive LLM arena with a novel multi-objective optimization algorithm—“Distance-to-Optimal-Score”—to achieve interpretable, balanced trade-offs between capability and safety metrics. The framework incorporates comprehensive safety evaluations (e.g., jailbreaking, bias, hallucination) and a dynamic leaderboard mechanism that adapts to evolving model releases. In its inaugural benchmarking round, 26 mainstream models were assessed, uncovering pervasive critical safety vulnerabilities—even among state-of-the-art models. The resulting framework delivers a reproducible, extensible, and responsible AI benchmarking suite, establishing a new paradigm for dual-objective LLM evaluation.
📝 Abstract
To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.