🤖 AI Summary
Existing methods for information extraction from financial transaction documents suffer from poor zero-shot generalization and difficulty in enforcing arithmetic constraints (e.g., balance consistency, quantity conservation).
Method: This paper proposes a neuro-symbolic fusion framework that leverages large language models (LLMs) to generate candidate extractions, augmented by a three-tier symbolic verification mechanism—syntactic, task-specific, and domain-specific—to rigorously enforce arithmetic consistency. It introduces the first structured schema tailored to transaction documents and enables symbolic-guided zero-shot knowledge distillation. Domain rules are tightly integrated with neural generation to enhance interpretability and cross-document generalization.
Results: Evaluated on a re-annotated transaction dataset, the framework achieves significant improvements in F1 score and accuracy. Empirical results validate the effectiveness and robustness of this neuro-symbolic verification paradigm for financial document processing.
📝 Abstract
This paper presents a neurosymbolic framework for information extraction from documents, evaluated on transactional documents. We introduce a schema-based approach that integrates symbolic validation methods to enable more effective zero-shot output and knowledge distillation. The methodology uses language models to generate candidate extractions, which are then filtered through syntactic-, task-, and domain-level validation to ensure adherence to domain-specific arithmetic constraints. Our contributions include a comprehensive schema for transactional documents, relabeled datasets, and an approach for generating high-quality labels for knowledge distillation. Experimental results demonstrate significant improvements in $F_1$-scores and accuracy, highlighting the effectiveness of neurosymbolic validation in transactional document processing.