🤖 AI Summary
Existing XBRL benchmarks reduce annotation to flat multi-class classification, focusing solely on narrative text while neglecting structured tables and comprehensive semantic alignment across the full US-GAAP taxonomy. Method: FinTagging introduces the first end-to-end, table-aware benchmark for financial reporting, decomposing XBRL annotation into two distinct tasks: Financial Named Entity Identification (FinNI) and Taxonomy Concept Linking (FinCL), covering over 10,000 US-GAAP items. It proposes a novel fine-grained task decomposition framework coupled with full-taxonomy semantic alignment, integrating zero-shot LLMs, table understanding, named entity recognition (NER), and ontology-driven concept matching. Contribution/Results: Experiments reveal that LLMs perform well on coarse-grained entity extraction but struggle significantly in distinguishing semantically similar US-GAAP concepts; overall annotation accuracy remains below practical deployment thresholds.
📝 Abstract
We introduce FinTagging, the first full-scope, table-aware XBRL benchmark designed to evaluate the structured information extraction and semantic alignment capabilities of large language models (LLMs) in the context of XBRL-based financial reporting. Unlike prior benchmarks that oversimplify XBRL tagging as flat multi-class classification and focus solely on narrative text, FinTagging decomposes the XBRL tagging problem into two subtasks: FinNI for financial entity extraction and FinCL for taxonomy-driven concept alignment. It requires models to jointly extract facts and align them with the full 10k+ US-GAAP taxonomy across both unstructured text and structured tables, enabling realistic, fine-grained evaluation. We assess a diverse set of LLMs under zero-shot settings, systematically analyzing their performance on both subtasks and overall tagging accuracy. Our results reveal that, while LLMs demonstrate strong generalization in information extraction, they struggle with fine-grained concept alignment, particularly in disambiguating closely related taxonomy entries. These findings highlight the limitations of existing LLMs in fully automating XBRL tagging and underscore the need for improved semantic reasoning and schema-aware modeling to meet the demands of accurate financial disclosure. Code is available at our GitHub repository and data is at our Hugging Face repository.