An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of sustaining subject indexing in large-scale multilingual digital libraries by integrating authority control with extreme multi-label text classification (XMTC). The authors construct a high-quality bilingual (English–German) bibliographic record dataset and develop a structured representation of the Integrated Authority File (GND) classification system. Through ontology alignment and bilingual corpus construction, the work enables ontology-aware multi-label classification, facilitating transparent, reproducible, and practically oriented AI-assisted cataloging. Key contributions include the release of the first bilingual annotated dataset of its kind and a machine-actionable GND ontology, advancing the evaluation of AI-based cataloging systems along three critical dimensions: accuracy, practical utility, and transparency.

Technology Category

Application Category

📝 Abstract
Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers' work.
Problem

Research questions and friction points this paper is trying to address.

Extreme Multi-label Text Classification
Subject Indexing
Digital Libraries
Authority Control
Multilingual Cataloging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extreme Multi-label Text Classification
Authority-controlled Indexing
Bilingual Library Corpus
Ontology-aware AI
Agent-assisted Cataloging
🔎 Similar Papers
No similar papers found.
Jennifer D'Souza
Jennifer D'Souza
TIB Leibniz Information Centre for Science and Technology
Natural Language ProcessingScientific Knowledge ExtractionLLM EvaluationScientometrics
S
Sameer Sadruddin
TIB Leibniz Information Centre for Science and Technology, Germany
M
Maximilian Kähler
Deutsche Nationalbibliothek, Germany
A
Andrea Salfinger
University of Udine, Italy
L
Luca Zaccagna
University of Udine, Italy
F
Francesca Incitti
University of Udine, Italy
Lauro Snidaro
Lauro Snidaro
Associate Professor in Computer Science, University of Udine
Data FusionComputer VisionVideo surveillanceMachine LearningMultimedia
Osma Suominen
Osma Suominen
Information Systems Specialist, National Library of Finland