Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

📅 2024-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM evaluation frameworks prioritize capability assessment while neglecting safety, and unidimensional weighted rankings lack interpretability. To address these limitations, this paper proposes the first dynamic leaderboard framework that jointly optimizes for both capability and safety. Methodologically, it integrates an interactive LLM arena with a novel multi-objective optimization algorithm—“Distance-to-Optimal-Score”—to achieve interpretable, balanced trade-offs between capability and safety metrics. The framework incorporates comprehensive safety evaluations (e.g., jailbreaking, bias, hallucination) and a dynamic leaderboard mechanism that adapts to evolving model releases. In its inaugural benchmarking round, 26 mainstream models were assessed, uncovering pervasive critical safety vulnerabilities—even among state-of-the-art models. The resulting framework delivers a reproducible, extensible, and responsible AI benchmarking suite, establishing a new paradigm for dual-objective LLM evaluation.

Technology Category

Application Category

📝 Abstract
To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Capability and Safety Evaluation
Responsible AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Libra-Leaderboard
Comprehensive Evaluation
Responsible AI
🔎 Similar Papers
No similar papers found.
H
Haonan Li
LibrAI, MBZUAI
X
Xudong Han
LibrAI, MBZUAI
Z
Zenan Zhai
LibrAI, Oracle
H
Honglin Mu
LibrAI
H
Hao Wang
LibrAI
Zhenxuan Zhang
Zhenxuan Zhang
Georgia Institute of Technology
Y
Yilin Geng
LibrAI, The University of Melbourne
S
Shom Lin
LibrAI, Tsinghua University
Renxi Wang
Renxi Wang
MBZUAI
Natural Language Processing
Artem Shelmanov
Artem Shelmanov
MBZUAI
uncertainty estimationfairnessactive learningnlpdeep learning
X
Xiangyu Qi
Princeton University
Yuxia Wang
Yuxia Wang
MBZUAI
Natural Language Processing
Donghai Hong
Donghai Hong
Peking University
AI SafetyAI AlignmentMulti-Modal Model
Y
Youliang Yuan
CUHK
M
Meng Chen
BUPT
Haoqin Tu
Haoqin Tu
University of California Santa Cruz
natural language processinggenerationmultimodal
Fajri Koto
Fajri Koto
Assistant Professor (tenure-track), MBZUAI
Computational LinguisticsNatural Language ProcessingMultilingual NLPHuman-centered NLP
Tatsuki Kuribayashi
Tatsuki Kuribayashi
MBZUAI
Natural Language ProcessingComputational Psycholinguistics
C
Cong Zeng
MBZUAI
Rishabh Bhardwaj
Rishabh Bhardwaj
Singapore University of Technology and Design
Natural Language ProcessingMachine Learning
Bingchen Zhao
Bingchen Zhao
University of Edinburgh
Artificial IntelligenceKnowledge Discovery
Yawen Duan
Yawen Duan
University of Cambridge
Deep LearningArtificial IntelligenceAI Safety
Y
Yi Liu
NTU
E
Emad A. Alghamdi
King Abdulaziz University
Y
Yaodong Yang
Peking University
Yinpeng Dong
Yinpeng Dong
Tsinghua University
Machine LearningDeep LearningAI Safety
S
Soujanya Poria
SUTD
P
Pengfei Liu
Shanghai Jiao Tong University
Zhengzhong Liu
Zhengzhong Liu
Institute of Foundation Models
Natural Language ProcessingMachine Learning
X
Xuguang Ren
MBZUAI
Eduard Hovy
Eduard Hovy
University of Melbourne, CMU
NLPAI
Iryna Gurevych
Iryna Gurevych
Full Professor, TU Darmstadt; Adjunct Professor, MBZUAI, UAE; Affiliated Professor, INSAIT, Bulgaria
Natural Language ProcessingLarge Language ModelsArtificial Intelligence
Preslav Nakov
Preslav Nakov
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Computational LinguisticsLarge Language ModelsFact-checkingFake News
Monojit Choudhury
Monojit Choudhury
Professor of Natural Language Processing, MBZUAI
Natural Language ProcessingLarge Language ModelsEthics of AIComputational Social Science
Timothy Baldwin
Timothy Baldwin
MBZUAI and The University of Melbourne
computational linguisticsnatural language processingartificial intelligence