🤖 AI Summary
Low accuracy in institutional normalization of author affiliation strings—characterized by nested multi-institutional structures and noise—hampers bibliometric analysis and cross-knowledge-base interoperability. Method: We propose AffRo, an end-to-end framework that jointly addresses affiliation parsing and coreference resolution, integrating rule-enhanced named entity recognition (NER), hierarchical organizational coreference resolution, and context-aware matching ranking. Contribution/Results: We introduce AffRoDB, the first expert-annotated benchmark dataset for affiliation normalization, filling a critical gap in systematic evaluation. On diverse, real-world affiliation strings from multiple sources, AffRo achieves a 12.6% absolute F1-score improvement over state-of-the-art methods, significantly enhancing scholarly metadata quality and enabling robust interoperation of organizational identifiers across knowledge bases.
📝 Abstract
Accurate affiliation matching, which links affiliation strings to standardized organization identifiers, is critical for improving research metadata quality, facilitating comprehensive bibliometric analyses, and supporting data interoperability across scholarly knowledge bases. Existing approaches fail to handle the complexity of affiliation strings that often include mentions of multiple organizations or extraneous information. In this paper, we present AffRo, a novel approach designed to address these challenges, leveraging advanced parsing and disambiguation techniques. We also introduce AffRoDB, an expert-curated dataset to systematically evaluate affiliation matching algorithms, ensuring robust benchmarking. Results demonstrate the effectiveness of AffRp in accurately identifying organizations from complex affiliation strings.