XBRLTagRec: Domain-Specific Fine-Tuning and Zero-Shot Re-Ranking with LLMs for Extreme Financial Numeral Labeling

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurately matching numerical facts to their corresponding labels within the semantically dense and large-scale taxonomy of XBRL financial reports remains a significant challenge. This work proposes an end-to-end framework that first fine-tunes FLAN-T5-Large on domain-specific data to generate semantically enriched label representations, then integrates semantic retrieval with a zero-shot reranking mechanism powered by ChatGPT-3.5 to effectively disambiguate highly similar labels. Evaluated on the FNXL dataset, the proposed approach substantially outperforms the current state-of-the-art model, FLAN-FinXC, achieving relative improvements of 2.64%–4.47% in Hits@1 and Macro metrics. Notably, the method demonstrates superior matching accuracy in extreme classification scenarios where label distinctions are particularly subtle.

Technology Category

Application Category

📝 Abstract
Publicly traded companies must disclose financial information under regulations of the Securities and Exchange Commission (SEC) and the Generally Accepted Accounting Principles (GAAP). The eXtensible Business Reporting Language (XBRL), as an XML-based financial language, enables standardized and machine-readable reporting, but accurate tag selection from large taxonomies remains challenging. Existing fine-tuning-based methods struggle to distinguish highly similar XBRL tags, limiting performance in financial data matching. To address these issues, we introduce XBRLTagRec, an end-to-end framework for automated financial numeral tagging. The framework generates semantic tag documents with a fine-tuned FLAN-T5-Large model, retrieves relevant candidates via semantic similarity, and applies zero-shot re-ranking with ChatGPT-3.5 to select the optimal tag. Experiments on the FNXL dataset show that XBRLTagRec outperforms the state-of-the-art FLAN-FinXC framework, achieving 2.64%-4.47% improvements in Hits@1 and Macro metrics. These results demonstrate its effectiveness in large-scale and semantically complex tag matching scenarios.
Problem

Research questions and friction points this paper is trying to address.

XBRL tagging
financial numeral labeling
semantic similarity
taxonomy matching
financial reporting
Innovation

Methods, ideas, or system contributions that make the work stand out.

domain-specific fine-tuning
zero-shot re-ranking
semantic similarity retrieval
financial numeral labeling
LLM-based tag recommendation
🔎 Similar Papers
No similar papers found.
Gang Hu
Gang Hu
Columbia University
System
Q
Qun Zhang
School of Information Science & Engineering, Yunnan University, Kunming, China
J
Jingyao Luo
School of Information Science & Engineering, Yunnan University, Kunming, China
Y
Yile Jiang
School of Information Science & Engineering, Yunnan University, Kunming, China
J
Jing Chai
School of Information Science & Engineering, Yunnan University, Kunming, China
Haiyan Ding
Haiyan Ding
Tsinghua University
Neonatal cerebral function monitoringCardiac magnetic resonance imaging