Neurosymbolic Information Extraction from Transactional Documents

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for information extraction from financial transaction documents suffer from poor zero-shot generalization and difficulty in enforcing arithmetic constraints (e.g., balance consistency, quantity conservation). Method: This paper proposes a neuro-symbolic fusion framework that leverages large language models (LLMs) to generate candidate extractions, augmented by a three-tier symbolic verification mechanism—syntactic, task-specific, and domain-specific—to rigorously enforce arithmetic consistency. It introduces the first structured schema tailored to transaction documents and enables symbolic-guided zero-shot knowledge distillation. Domain rules are tightly integrated with neural generation to enhance interpretability and cross-document generalization. Results: Evaluated on a re-annotated transaction dataset, the framework achieves significant improvements in F1 score and accuracy. Empirical results validate the effectiveness and robustness of this neuro-symbolic verification paradigm for financial document processing.

Technology Category

Application Category

📝 Abstract
This paper presents a neurosymbolic framework for information extraction from documents, evaluated on transactional documents. We introduce a schema-based approach that integrates symbolic validation methods to enable more effective zero-shot output and knowledge distillation. The methodology uses language models to generate candidate extractions, which are then filtered through syntactic-, task-, and domain-level validation to ensure adherence to domain-specific arithmetic constraints. Our contributions include a comprehensive schema for transactional documents, relabeled datasets, and an approach for generating high-quality labels for knowledge distillation. Experimental results demonstrate significant improvements in $F_1$-scores and accuracy, highlighting the effectiveness of neurosymbolic validation in transactional document processing.
Problem

Research questions and friction points this paper is trying to address.

Develops a neurosymbolic framework for transactional document information extraction
Integrates symbolic validation to enhance zero-shot performance and knowledge distillation
Ensures domain-specific arithmetic constraints through multi-level validation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neurosymbolic framework integrates symbolic validation methods
Schema-based approach enables effective zero-shot output and distillation
Validation filters candidates through syntactic, task, and domain levels
🔎 Similar Papers
No similar papers found.