🤖 AI Summary
The academic document processing community lacks standardized evaluation protocols and adherence to FAIR (Findable, Accessible, Interoperable, Reusable) principles. Method: This work establishes an integrated, open shared-task framework comprising 12 diverse, scenario-specific tasks—including citation parsing, scientific term recognition, and figure-table understanding—unified under a reproducible evaluation infrastructure that synergistically integrates NLP, machine learning, and data infrastructure technologies. High-quality open datasets, benchmark models, and open-source toolkits are systematically released alongside each task. Contribution/Results: This is the first large-scale initiative in the field to operationalize FAIRness and methodological transparency. Multiple tasks have been adopted as official shared tasks at top-tier conferences (e.g., ACL, EMNLP), and the resulting resources are now integrated into the German National Research Data Infrastructure for Data Science (NFDI4DS), substantially enhancing method comparability, result discoverability, and cross-community collaboration efficiency.
📝 Abstract
Shared tasks are powerful tools for advancing research through community-based standardised evaluation. As such, they play a key role in promoting findable, accessible, interoperable, and reusable (FAIR), as well as transparent and reproducible research practices. This paper presents an updated overview of twelve shared tasks developed and hosted under the German National Research Data Infrastructure for Data Science and Artificial Intelligence (NFDI4DS) consortium, covering a diverse set of challenges in scholarly document processing. Hosted at leading venues, the tasks foster methodological innovations and contribute open-access datasets, models, and tools for the broader research community, which are integrated into the consortium's research data infrastructure.