Practical Author Name Disambiguation under Metadata Constraints: A Contrastive Learning Approach for Astronomy Literature

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

To address author name ambiguity caused by metadata sparsity in large digital libraries (e.g., NASA/ADS), this paper proposes a low-metadata-dependency contrastive learning framework for author disambiguation. It formulates disambiguation as a similarity learning task and employs a Siamese neural network to jointly embed author names, paper titles, and abstracts—explicitly avoiding reliance on fragile metadata such as affiliations or journal names. As a key contribution, the authors introduce the first large-scale, ORCID-aligned benchmark dataset for astronomy, enabling rigorous evaluation in a domain with scarce ground-truth annotations. Experiments demonstrate state-of-the-art performance: 94% accuracy on pairwise disambiguation and >95% F1-score on clustering—substantially outperforming existing methods. Robustness is validated on real astronomical literature. The code, pre-trained models, and evaluation dataset are publicly released to advance author disambiguation research under low-resource conditions.

Technology Category

Application Category

📝 Abstract

The ability to distinctly and properly collate an individual researcher's publications is crucial for ensuring appropriate recognition, guiding the allocation of research funding and informing hiring decisions. However, accurately grouping and linking a researcher's entire body of work with their individual identity is challenging because of widespread name ambiguity across the growing literature. Algorithmic author name disambiguation provides a scalable approach to disambiguating author identities, yet existing methods have limitations. Many modern author name disambiguation methods rely on comprehensive metadata features such as venue or affiliation. Despite advancements in digitally indexing publications, metadata is often unavailable or inconsistent in large digital libraries(e.g. NASA/ADS). We introduce the Neural Author Name Disambiguator, a method that disambiguates author identities in large digital libraries despite limited metadata availability. We formulate the disambiguation task as a similarity learning problem by employing a Siamese neural network to disambiguate author names across publications relying solely on widely available publication metadata-author names, titles and abstracts. We construct the Large-Scale Physics ORCiD Linked dataset to evaluate the Neural Author Name Disambiguator by cross-matching NASA/ADS publications ORCiD. By leveraging foundation models to embed metadata into features, our model achieves up to 94% accuracy in pairwise disambiguation and over 95% F1 in clustering publications into their researcher identities. We release the testing dataset as a benchmark for physics and astronomy, providing realistic evaluation conditions for future disambiguation methods. The Neural Author Name Disambiguator algorithm demonstrates effective disambiguation with minimal metadata, offering a scalable solution for name ambiguity in large digital libraries.

Problem

Research questions and friction points this paper is trying to address.

Disambiguating author identities in astronomy literature with limited metadata

Solving name ambiguity issues in large digital libraries like NASA/ADS

Grouping publications by researcher identity using minimal available information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Siamese neural network for similarity learning

Leverages foundation models to embed metadata features

Relies solely on author names, titles and abstracts

🔎 Similar Papers

No similar papers found.