Croissant Tasks: A Metadata Format for Reproducible Machine Learning Evaluations

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the persistent challenge of reproducibility in machine learning research, which is often hindered by ambiguous implementation details and environment-specific dependencies. The authors propose a declarative, machine-executable metadata format that formally decouples task specifications from concrete implementations, abstracting low-level details into high-level semantic descriptions. Leveraging this format, they develop an automated pipeline powered by large language models and intelligent agents capable of generating functionally correct and results-consistent reproduction code from scratch—without reliance on original implementations. Empirical evaluations demonstrate that the approach can automatically reconstruct existing benchmarks and achieve conceptual reproducibility, substantially enhancing the reliability and automation of model evaluation.
📝 Abstract
Reproducibility is fundamental to the scientific method, yet remains a critical challenge in machine learning. Contributing factors include underspecified execution details and brittle software environments. Human-centric remedies, such as checklists and manual verification, help but require intensive effort and fail to scale. To address this, we introduce Croissant Tasks: a declarative, machine-actionable metadata format that abstracts low-level implementation details into high-level specifications. This format enables conceptual reproducibility: verifying claims via independent, agent-generated implementations rather than brittle source code replication. We contribute: (1) the Croissant Tasks specification, formally decoupling task problem from solution; (2) an automated LLM pipeline that retrofits existing benchmarks into this format; and (3) empirical validation showing autonomous agents can ingest these specifications to generate functional, accurate reproduction pipelines from scratch. We envision this format as a new foundation for automated and conceptual reproducibility in machine learning.
Problem

Research questions and friction points this paper is trying to address.

reproducibility
machine learning
metadata format
software environments
execution details
Innovation

Methods, ideas, or system contributions that make the work stand out.

Croissant Tasks
conceptual reproducibility
machine-actionable metadata
declarative specification
automated reproduction
🔎 Similar Papers
Omar Benjelloun
Omar Benjelloun
Google, Inc.
DatabasesOpen Data
Leonardo Martins Bianco
Leonardo Martins Bianco
PhD Student, Université Paris-Saclay
Large Language ModelsRobustnessCommunity detectionGraph estimation problems
Isabelle Guyon
Isabelle Guyon
Director of Research, Google; Prof. UPSaclay; President ChaLearn
Machine Learning
T
Thanh Gia Hieu Khuong
Université Paris-Saclay, Gif-sur-Yvette, France
J
Jonathan Lebensold
Jetty, Montreal, QC, Canada; Mila, Quebec AI Institute, Montreal, QC, Canada
S
Sebastian Lobentanzer
Inst. of Computational Biology, Helmholtz Munich, Germany; German Center for Diabetes Research, Munich, Germany; School of CIT, TUM, Munich, Germany; Helmholtz AI, Munich, Germany
Luis Oala
Luis Oala
Founder and Chief AI Officer at Brickroad
Machine Learning
B
Benedictus Kent Rachmat
Université Paris-Saclay, Gif-sur-Yvette, France
Ihsan Ullah
Ihsan Ullah
University of Balochistan, Quetta, Pakistan
P2P video streamingP2P IPTVIPTV User BehaviorIoTMultimedia Communication
P
Peyman Vahidi
Inst. of Computational Biology, Helmholtz Munich, Germany; German Center for Diabetes Research, Munich, Germany; School of CIT, TUM, Munich, Germany; Helmholtz AI, Munich, Germany
Joaquin Vanschoren
Joaquin Vanschoren
Eindhoven University of Technology; Google Deepmind (Visiting)
Artificial IntelligenceMachine Learning