AGI-Elo: How Far Are We From Mastering A Task?

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AGI evaluation lacks a cross-modal unified framework that jointly characterizes task difficulty and model/human capability, hindering systematic analysis of capability gaps and long-tail challenges across vision, language, and action domains. To address this, we propose the first Elo-based dynamic rating system for joint cross-modal (vision–language–action) assessment, moving beyond unidimensional accuracy metrics. Our method integrates multi-source benchmarks—including VQA, RLBench, and BIG-bench—via Bayesian modeling and an iterative adversarial scoring algorithm, enabling fine-grained, difficulty-aware, bidirectional evaluation of both tasks and models. The system generates interpretable difficulty–capability distribution maps and quantifies the gap between current models and full task mastery. Extensive validation across diverse AGI scenarios demonstrates robustness and strong generalization capability.

Technology Category

Application Category

📝 Abstract
As the field progresses toward Artificial General Intelligence (AGI), there is a pressing need for more comprehensive and insightful evaluation frameworks that go beyond aggregate performance metrics. This paper introduces a unified rating system that jointly models the difficulty of individual test cases and the competency of AI models (or humans) across vision, language, and action domains. Unlike existing metrics that focus solely on models, our approach allows for fine-grained, difficulty-aware evaluations through competitive interactions between models and tasks, capturing both the long-tail distribution of real-world challenges and the competency gap between current models and full task mastery. We validate the generalizability and robustness of our system through extensive experiments on multiple established datasets and models across distinct AGI domains. The resulting rating distributions offer novel perspectives and interpretable insights into task difficulty, model progression, and the outstanding challenges that remain on the path to achieving full AGI task mastery.
Problem

Research questions and friction points this paper is trying to address.

Develops a unified rating system for AI and human competency evaluation
Models task difficulty and competency across vision, language, and action domains
Provides interpretable insights into AGI task mastery challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified rating system for AI and task difficulty
Fine-grained difficulty-aware evaluation method
Validated across multiple AGI domains
🔎 Similar Papers
No similar papers found.
Shuo Sun
Shuo Sun
Johns Hopkins University
Yimin Zhao
Yimin Zhao
National University of Singapore
RoboticsEEGDeep Learning
C
Christina Dao Wen Lee
National University of Singapore
J
Jiawei Sun
National University of Singapore
C
Chengran Yuan
National University of Singapore
Zefan Huang
Zefan Huang
National University of Singapore
RoboticsAutonomous VehiclesArtificial Intelligence
D
Dongen Li
National University of Singapore, Singapore MIT Alliance for Research and Technology
J
Justin KW Yeoh
National University of Singapore
Alok Prakash
Alok Prakash
Scientific Director, Singapore MIT Alliance for Research and Technology (SMART)
Intelligent Transportation Systems – First mile/last mile problemDL Inference Embedded/Mobile PlatformsEdge AI
T
Thomas W. Malone
Massachusetts Institute of Technology, Singapore MIT Alliance for Research and Technology
M
Marcelo H. Ang
National University of Singapore, Singapore MIT Alliance for Research and Technology