NFDI4DS Shared Tasks for Scholarly Document Processing

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The academic document processing community lacks standardized evaluation protocols and adherence to FAIR (Findable, Accessible, Interoperable, Reusable) principles. Method: This work establishes an integrated, open shared-task framework comprising 12 diverse, scenario-specific tasks—including citation parsing, scientific term recognition, and figure-table understanding—unified under a reproducible evaluation infrastructure that synergistically integrates NLP, machine learning, and data infrastructure technologies. High-quality open datasets, benchmark models, and open-source toolkits are systematically released alongside each task. Contribution/Results: This is the first large-scale initiative in the field to operationalize FAIRness and methodological transparency. Multiple tasks have been adopted as official shared tasks at top-tier conferences (e.g., ACL, EMNLP), and the resulting resources are now integrated into the German National Research Data Infrastructure for Data Science (NFDI4DS), substantially enhancing method comparability, result discoverability, and cross-community collaboration efficiency.

Technology Category

Application Category

📝 Abstract
Shared tasks are powerful tools for advancing research through community-based standardised evaluation. As such, they play a key role in promoting findable, accessible, interoperable, and reusable (FAIR), as well as transparent and reproducible research practices. This paper presents an updated overview of twelve shared tasks developed and hosted under the German National Research Data Infrastructure for Data Science and Artificial Intelligence (NFDI4DS) consortium, covering a diverse set of challenges in scholarly document processing. Hosted at leading venues, the tasks foster methodological innovations and contribute open-access datasets, models, and tools for the broader research community, which are integrated into the consortium's research data infrastructure.
Problem

Research questions and friction points this paper is trying to address.

Advancing research through community-based standardized evaluation
Promoting FAIR and reproducible research practices
Addressing diverse challenges in scholarly document processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Community-based standardized evaluation shared tasks
FAIR transparent reproducible research practices
Open-access datasets models tools integration
🔎 Similar Papers
No similar papers found.
R
Raia Abu Ahmad
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI), Berlin, Germany
R
Rana Abdulla
Leuphana University of Lüneburg, Germany
T
Tilahun Abedissa Taffa
Leuphana University of Lüneburg, Germany
S
Soeren Auer
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Hamed Babaei Giglou
Hamed Babaei Giglou
TIB — Leibniz Information Centre for Science and Technology
NLPLLMsReinforcement LearningOntology EngineeringSemantic Web
Ekaterina Borisova
Ekaterina Borisova
Institute of Electronics, Bulgarian Academy of Sciences
skin cancer diagnosisfluorescence spectroscopygastrointestinal tract cancer diagnosisreflectance spectroscopy of pigmented
Z
Zongxiong Chen
Fraunhofer Institute for Open Communication Systems FOKUS, Berlin, Germany
Stefan Dietze
Stefan Dietze
Full Professor (Heinrich-Heine-University Düsseldorf) & Scientific Director (KTS, GESIS)
Knowledge GraphsInformation RetrievalWeb ScienceNLP
J
Jennifer DSouza
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
M
Mayra Elwes
University Hospital Cologne, University of Cologne, Institute for Biomedical Informatics, Germany
G
Genet-Asefa Gesese
FIZ-Karlsruhe – Leibniz-Institute for Information Infrastructure, Germany
Shufan Jiang
Shufan Jiang
East China University of Science and Technology
Large Language ModelsMulti-Agent SystemsScaling Environment for AgentsWorld Models
E
Ekaterina Kutafina
University Hospital Cologne, University of Cologne, Institute for Biomedical Informatics, Germany
Philipp Mayr
Philipp Mayr
GESIS - Leibniz Institute for the Social Sciences
Interactive Information RetrievalInformetricsDigital librariesInformation SeekingDataset Search
Georg Rehm
Georg Rehm
Principal Researcher and Research Fellow, DFKI GmbH
Natural Language ProcessingArtificial IntelligenceLanguage TechnologyComputational LinguisticsSemantic Web
S
Sameer Sadruddin
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Sonja Schimmler
Sonja Schimmler
Fraunhofer FOKUS
D
Daniel Schneider
Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Germany
K
Kanishka Silva
GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany
Sharmila Upadhyaya
Sharmila Upadhyaya
Gesis Leibniz Institute
Ricardo Usbeck
Ricardo Usbeck
Full Professor of AI and Explainability at Leuphana University Lüneburg
Artificial IntelligenceKnowledge GraphsQuestion AnsweringEntity LinkingSustainability