An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

138K/year

🤖 AI Summary

This study addresses the challenge of sustaining subject indexing in large-scale multilingual digital libraries by integrating authority control with extreme multi-label text classification (XMTC). The authors construct a high-quality bilingual (English–German) bibliographic record dataset and develop a structured representation of the Integrated Authority File (GND) classification system. Through ontology alignment and bilingual corpus construction, the work enables ontology-aware multi-label classification, facilitating transparent, reproducible, and practically oriented AI-assisted cataloging. Key contributions include the release of the first bilingual annotated dataset of its kind and a machine-actionable GND ontology, advancing the evaluation of AI-based cataloging systems along three critical dimensions: accuracy, practical utility, and transparency.

Technology Category

Application Category

📝 Abstract

Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers' work.

Problem

Research questions and friction points this paper is trying to address.

Extreme Multi-label Text Classification

Subject Indexing

Digital Libraries

Authority Control

Multilingual Cataloging

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extreme Multi-label Text Classification

Authority-controlled Indexing

Bilingual Library Corpus