🤖 AI Summary
Accurately matching numerical facts to their corresponding labels within the semantically dense and large-scale taxonomy of XBRL financial reports remains a significant challenge. This work proposes an end-to-end framework that first fine-tunes FLAN-T5-Large on domain-specific data to generate semantically enriched label representations, then integrates semantic retrieval with a zero-shot reranking mechanism powered by ChatGPT-3.5 to effectively disambiguate highly similar labels. Evaluated on the FNXL dataset, the proposed approach substantially outperforms the current state-of-the-art model, FLAN-FinXC, achieving relative improvements of 2.64%–4.47% in Hits@1 and Macro metrics. Notably, the method demonstrates superior matching accuracy in extreme classification scenarios where label distinctions are particularly subtle.
📝 Abstract
Publicly traded companies must disclose financial information under regulations of the Securities and Exchange Commission (SEC) and the Generally Accepted Accounting Principles (GAAP). The eXtensible Business Reporting Language (XBRL), as an XML-based financial language, enables standardized and machine-readable reporting, but accurate tag selection from large taxonomies remains challenging. Existing fine-tuning-based methods struggle to distinguish highly similar XBRL tags, limiting performance in financial data matching. To address these issues, we introduce XBRLTagRec, an end-to-end framework for automated financial numeral tagging. The framework generates semantic tag documents with a fine-tuned FLAN-T5-Large model, retrieves relevant candidates via semantic similarity, and applies zero-shot re-ranking with ChatGPT-3.5 to select the optimal tag. Experiments on the FNXL dataset show that XBRLTagRec outperforms the state-of-the-art FLAN-FinXC framework, achieving 2.64%-4.47% improvements in Hits@1 and Macro metrics. These results demonstrate its effectiveness in large-scale and semantically complex tag matching scenarios.