🤖 AI Summary
This study addresses the challenge of sustaining subject indexing in large-scale multilingual digital libraries by integrating authority control with extreme multi-label text classification (XMTC). The authors construct a high-quality bilingual (English–German) bibliographic record dataset and develop a structured representation of the Integrated Authority File (GND) classification system. Through ontology alignment and bilingual corpus construction, the work enables ontology-aware multi-label classification, facilitating transparent, reproducible, and practically oriented AI-assisted cataloging. Key contributions include the release of the first bilingual annotated dataset of its kind and a machine-actionable GND ontology, advancing the evaluation of AI-based cataloging systems along three critical dimensions: accuracy, practical utility, and transparency.
📝 Abstract
Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers' work.