REGREACT: Self-Correcting Multi-Agent Pipelines for Structured Regulatory Information Extraction

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the challenges of extracting structured compliance requirements from regulatory documents, including model hallucination, loss of hierarchical relationships, and difficulties in resolving cross-document dependencies. The authors propose REGREACT, a novel seven-stage self-correcting multi-agent framework in which each stage employs an Observe–Diagnose–Repair (ODR) loop to iteratively refine outputs. By integrating external legal document retrieval and summary embeddings, the framework generates self-contained, structured representations that not only mitigate hallucinations but also detect and correct erroneous cross-references within regulations, constructing a typed criterion graph to preserve structural fidelity. Evaluated on a newly curated dataset comprising 242 activities and over 4,800 hierarchical criteria derived from three EU Delegated Acts, the approach significantly outperforms a GPT-4o single-pass baseline across all structural and semantic metrics.

Technology Category

Application Category

📝 Abstract

Extracting structured, machine-readable compliance criteria from regulatory documents remains an open challenge. Single-pass language models hallucinate structural elements, lose hierarchical relationships, and fail to resolve inter-document dependencies. We introduce \textsc{RegReAct}, a self-correcting multi-agent framework that decomposes regulatory information extraction into seven specialized stages, each with an \textit{Observe--Diagnose--Repair} (ODR) loop that validates outputs against the source, correcting not only model hallucinations but also cross-reference errors in the regulations themselves. To ensure structural accuracy, \textsc{RegReAct} constructs a typed criterion graph; to ensure completeness, it resolves external dependencies by retrieving, summarizing, and embedding referenced legal content inline, producing self-contained outputs. Applying \textsc{RegReAct} to three EU Taxonomy Delegated Acts, we construct a dataset comprising 242 activities with over 4,800 hierarchical criteria, thresholds, and enriched source summaries. Evaluation against a GPT-4o single-pass baseline confirms that \textsc{RegReAct} outperforms it across all structural and semantic metrics. Code and data will be made publicly available: https://github.com/RECOR-Benchmark/RECOR

Problem

Research questions and friction points this paper is trying to address.

regulatory information extraction

structured compliance criteria

multi-agent systems

hallucination

inter-document dependencies

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework

self-correcting

structured information extraction