🤖 AI Summary
EU regulatory Construction Products Performance Declarations (DoPs) exhibit high heterogeneity in format, language, and structure, severely undermining the robustness and reliability of existing static or LLM-only information extraction methods—often resulting in hallucination. To address this, we propose an adaptive agent system tailored for regulatory documents, built upon a stateful “Plan–Execute–Respond” architecture that dynamically orchestrates multimodal parsing and domain-specific toolchains. Our approach integrates domain rules, large language models, and collaborative reasoning to enable cross-format parsing and multilingual question answering. Furthermore, we introduce dynamic workflow orchestration to enhance traceability and alignment with user intent. Evaluated on real-world DoP datasets, our method significantly improves accuracy and generalization in key-value pair extraction and QA tasks. The system delivers a scalable, auditable, and compliant solution for regulatory document structuring.
📝 Abstract
Declaration of Performance (DoP) documents, mandated by EU regulation, certify the performance of construction products. While some of their content is standardized, DoPs vary widely in layout, language, schema, and format, posing challenges for automated key-value pair extraction (KVP) and question answering (QA). Existing static or LLM-only IE pipelines often hallucinate and fail to adapt to this structural diversity. Our domain-specific, stateful agentic system addresses these challenges through a planner-executor-responder architecture. The system infers user intent, detects document modality, and orchestrates tools dynamically for robust, traceable reasoning while avoiding tool misuse or execution loops. Evaluation on a curated DoP dataset demonstrates improved robustness across formats and languages, offering a scalable solution for structured data extraction in regulated workflows.