CUBE: A Standard for Unifying Agent Benchmarks

πŸ“… 2026-03-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ“ Abstract
The proliferation of agent benchmarks has created critical fragmentation that threatens research productivity. Each new benchmark requires substantial custom integration, creating an "integration tax" that limits comprehensive evaluation. We propose CUBE (Common Unified Benchmark Environments), a universal protocol standard built on MCP and Gym that allows benchmarks to be wrapped once and used everywhere. By separating task, benchmark, package, and registry concerns into distinct API layers, CUBE enables any compliant platform to access any compliant benchmark for evaluation, RL training, or data generation without custom integration. We call on the community to contribute to the development of this standard before platform-specific implementations deepen fragmentation as benchmark production accelerates through 2026.
πŸ”Ž Similar Papers
No similar papers found.
Alexandre Lacoste
Alexandre Lacoste
Staff Research Scientist, ServiceNow Research
machine learning
Nicolas Gontier
Nicolas Gontier
ServiceNow Research
Dialog SystemsReasoningDeep LearningNatural Language Processing
O
Oleh Shliazhko
ServiceNow AI Research
Aman Jaiswal
Aman Jaiswal
Dalhousie university
NLP
Kusha Sareen
Kusha Sareen
McGill University, Mila
Deep LearningGeometric Deep LearningGenerative Modelling
S
Shailesh Nanisetty
ServiceNow AI Research
J
Joan Cabezas
Manuel Del Verme
Manuel Del Verme
McGill / MILA
AI Agents / Reinforcement Learning
O
Omar G. Younis
Silverstream.ai
S
Simone Baratta
Silverstream.ai
M
Matteo Avalle
Silverstream.ai
I
Imene Kerboua
McGill, Mila
Xing Han LΓΉ
Xing Han LΓΉ
PhD Student at McGill University; Mila
Natural Language ProcessingMachine Learning
Elron Bandel
Elron Bandel
Research Scientist, IBM Research AI
NLPAIDeep LearningLanguage ModelsText Generation
Michal Shmueli-Scheuer
Michal Shmueli-Scheuer
IBM Research
HCINLPNLG
Asaf Yehudai
Asaf Yehudai
PhD Candidate
Natural Language ProcessingMachine Translation
Leshem Choshen
Leshem Choshen
MIT, IBM AI research
Model RecyclingEvolving Collaborative PretrainingEvaluationModel MergingOpen the Black Box
J
Jonathan Lebensold
Jetty
S
Sean Hughes
ServiceNow AI Research
Massimo Caccia
Massimo Caccia
ServiceNow Research
Deep Learning
Alexandre Drouin
Alexandre Drouin
Head of Frontier AI Research, ServiceNow Research - Adjunct Professor, ULaval, Mila
Machine LearningDeep LearningCausal InferenceComputational Biology
Siva Reddy
Siva Reddy
McGill University, Mila Quebec AI Institute
Natural Language ProcessingComputational LinguisticsDeep LearningSemantics
T
Tao Yu
HKU
Y
Yu Su
OSU
Graham Neubig
Graham Neubig
Carnegie Mellon University, All Hands AI
Natural Language ProcessingMachine LearningArtificial Intelligence
Dawn Song
Dawn Song
Professor of Computer Science, UC Berkeley
Computer Security and Privacy