An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This study addresses the challenge of sustaining subject indexing in large-scale multilingual digital libraries by integrating authority control with extreme multi-label text classification (XMTC). The authors construct a high-quality bilingual (English–German) bibliographic record dataset and develop a structured representation of the Integrated Authority File (GND) classification system. Through ontology alignment and bilingual corpus construction, the work enables ontology-aware multi-label classification, facilitating transparent, reproducible, and practically oriented AI-assisted cataloging. Key contributions include the release of the first bilingual annotated dataset of its kind and a machine-actionable GND ontology, advancing the evaluation of AI-based cataloging systems along three critical dimensions: accuracy, practical utility, and transparency.

Technology Category

Application Category

📝 Abstract
Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers' work.
Problem

Research questions and friction points this paper is trying to address.

Extreme Multi-label Text Classification
Subject Indexing
Digital Libraries
Authority Control
Multilingual Cataloging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extreme Multi-label Text Classification
Authority-controlled Indexing
Bilingual Library Corpus
Ontology-aware AI
Agent-assisted Cataloging
🔎 Similar Papers
No similar papers found.