An Agentic Toolkit for Adaptive Information Extraction from Regulatory Documents

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

EU regulatory Construction Products Performance Declarations (DoPs) exhibit high heterogeneity in format, language, and structure, severely undermining the robustness and reliability of existing static or LLM-only information extraction methods—often resulting in hallucination. To address this, we propose an adaptive agent system tailored for regulatory documents, built upon a stateful “Plan–Execute–Respond” architecture that dynamically orchestrates multimodal parsing and domain-specific toolchains. Our approach integrates domain rules, large language models, and collaborative reasoning to enable cross-format parsing and multilingual question answering. Furthermore, we introduce dynamic workflow orchestration to enhance traceability and alignment with user intent. Evaluated on real-world DoP datasets, our method significantly improves accuracy and generalization in key-value pair extraction and QA tasks. The system delivers a scalable, auditable, and compliant solution for regulatory document structuring.

Technology Category

Application Category

📝 Abstract

Declaration of Performance (DoP) documents, mandated by EU regulation, certify the performance of construction products. While some of their content is standardized, DoPs vary widely in layout, language, schema, and format, posing challenges for automated key-value pair extraction (KVP) and question answering (QA). Existing static or LLM-only IE pipelines often hallucinate and fail to adapt to this structural diversity. Our domain-specific, stateful agentic system addresses these challenges through a planner-executor-responder architecture. The system infers user intent, detects document modality, and orchestrates tools dynamically for robust, traceable reasoning while avoiding tool misuse or execution loops. Evaluation on a curated DoP dataset demonstrates improved robustness across formats and languages, offering a scalable solution for structured data extraction in regulated workflows.

Problem

Research questions and friction points this paper is trying to address.

Adaptive extraction from diverse regulatory document layouts

Overcoming hallucinations in automated key-value pair extraction

Robust information extraction across varying formats and languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic system with planner-executor-responder architecture

Dynamic tool orchestration for robust reasoning

Adaptive information extraction across document modalities

🔎 Similar Papers

Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM