TartuNLP at SemEval-2025 Task 5: Subject Tagging as Two-Stage Information Retrieval

📅 2025-04-30
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the need for librarian-assisted subject indexing by proposing a two-stage thematic tag assignment method. First, a dual-encoder model efficiently retrieves coarse candidate tags from a large, structured subject taxonomy. Second, a cross-encoder performs semantic fine-grained re-ranking of the candidate set. The approach uniquely formulates subject indexing as a cascaded information retrieval task, integrating pretrained language models with the hierarchical structure of subject taxonomies—thereby balancing retrieval efficiency and long-tail tag recall. Evaluated on SemEval-2025 Task 5, the method achieves significantly higher recall than single-stage baselines and ranks among the top performers in qualitative assessment. Results demonstrate its effectiveness and practical utility for domain-specific indexing tasks.

Technology Category

Application Category

📝 Abstract
We present our submission to the Task 5 of SemEval-2025 that aims to aid librarians in assigning subject tags to the library records by producing a list of likely relevant tags for a given document. We frame the task as an information retrieval problem, where the document content is used to retrieve subject tags from a large subject taxonomy. We leverage two types of encoder models to build a two-stage information retrieval system -- a bi-encoder for coarse-grained candidate extraction at the first stage, and a cross-encoder for fine-grained re-ranking at the second stage. This approach proved effective, demonstrating significant improvements in recall compared to single-stage methods and showing competitive results according to qualitative evaluation.
Problem

Research questions and friction points this paper is trying to address.

Aiding librarians in assigning subject tags to library records
Framing subject tagging as a two-stage information retrieval problem
Improving recall and accuracy with bi-encoder and cross-encoder models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage information retrieval system
Bi-encoder for coarse-grained extraction
Cross-encoder for fine-grained re-ranking
🔎 Similar Papers
No similar papers found.