MetaLead: A Comprehensive Human-Curated Leaderboard Dataset for Transparent Reporting of Machine Learning Experiments

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current machine learning leaderboards predominantly rely on manually curated entries, and automated dataset collection efforts typically capture only the best-reported results from papers, lacking comprehensive experimental records and fine-grained metadata. This limits transparent and comparable model evaluation. To address this gap, this work proposes MetaLead—a meticulously annotated, structured dataset that systematically compiles all experimental results reported in research papers, explicitly labeling each experiment’s type (e.g., baseline, proposed method, or its variants) and clearly indicating the separation between training and test datasets. By preserving full experimental context, MetaLead substantially enhances leaderboard transparency and analytical depth, offering a high-fidelity, context-rich benchmark resource for reliable cross-study and cross-domain model performance comparison.

Technology Category

Application Category

📝 Abstract
Leaderboards are crucial in the machine learning (ML) domain for benchmarking and tracking progress. However, creating leaderboards traditionally demands significant manual effort. In recent years, efforts have been made to automate leaderboard generation, but existing datasets for this purpose are limited by capturing only the best results from each paper and limited metadata. We present MetaLead, a fully human-annotated ML Leaderboard dataset that captures all experimental results for result transparency and contains extra metadata, such as the result experimental type: baseline, proposed method, or variation of proposed method for experiment-type guided comparisons, and explicitly separates train and test dataset for cross-domain assessment. This enriched structure makes MetaLead a powerful resource for more transparent and nuanced evaluations across ML research.
Problem

Research questions and friction points this paper is trying to address.

leaderboard
machine learning
result transparency
metadata
benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

leaderboard dataset
human-curated
result transparency
experimental type annotation
cross-domain assessment
🔎 Similar Papers
No similar papers found.
R
Roelien C. Timmer
CSIRO Data61, Australia
N
Necva Bolucu
CSIRO Data61, Australia
Stephen Wan
Stephen Wan
Data61 CSIRO
computational linguistics