NFDI4DS Shared Tasks for Scholarly Document Processing

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

The academic document processing community lacks standardized evaluation protocols and adherence to FAIR (Findable, Accessible, Interoperable, Reusable) principles. Method: This work establishes an integrated, open shared-task framework comprising 12 diverse, scenario-specific tasks—including citation parsing, scientific term recognition, and figure-table understanding—unified under a reproducible evaluation infrastructure that synergistically integrates NLP, machine learning, and data infrastructure technologies. High-quality open datasets, benchmark models, and open-source toolkits are systematically released alongside each task. Contribution/Results: This is the first large-scale initiative in the field to operationalize FAIRness and methodological transparency. Multiple tasks have been adopted as official shared tasks at top-tier conferences (e.g., ACL, EMNLP), and the resulting resources are now integrated into the German National Research Data Infrastructure for Data Science (NFDI4DS), substantially enhancing method comparability, result discoverability, and cross-community collaboration efficiency.

Technology Category

Application Category

📝 Abstract

Shared tasks are powerful tools for advancing research through community-based standardised evaluation. As such, they play a key role in promoting findable, accessible, interoperable, and reusable (FAIR), as well as transparent and reproducible research practices. This paper presents an updated overview of twelve shared tasks developed and hosted under the German National Research Data Infrastructure for Data Science and Artificial Intelligence (NFDI4DS) consortium, covering a diverse set of challenges in scholarly document processing. Hosted at leading venues, the tasks foster methodological innovations and contribute open-access datasets, models, and tools for the broader research community, which are integrated into the consortium's research data infrastructure.

Problem

Research questions and friction points this paper is trying to address.

Advancing research through community-based standardized evaluation

Promoting FAIR and reproducible research practices

Addressing diverse challenges in scholarly document processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Community-based standardized evaluation shared tasks

FAIR transparent reproducible research practices

Open-access datasets models tools integration

🔎 Similar Papers

No similar papers found.

Datadog

$140,000—$400,000 USD

New York City / Paris

Senior Research Scientist, Data Management and Security - Infrastructure System Lab

ByteDance

圣何塞

Research Scientist Intern, Multimodal AI (PhD)