π€ AI Summary
This work proposes an AI assistant tailored for pharmaceutical regulatory compliance, addressing challenges posed by frequently updated, heterogeneously formatted, and cross-jurisdictionally complex regulatory requirements that are costly and error-prone to manage manually. The system ingests multi-source regulatory documents through a secure data pipeline and introduces HiSACC, a novel hierarchical semantic chunking method that preserves semantic coherence across non-contiguous text segments. It further enhances retrieval relevance via ReLACE, a domain-adapted, listwise adaptive cross-encoder. Designed for auditability, traceability, and incremental updates, the system significantly improves response relevance, factual accuracy, and contextual focus in enterprise deployment, effectively mitigating hallucination risks and meeting the stringent demands of high-compliance environments.
π Abstract
The increasing frequency and complexity of regulatory updates present a significant burden for multinational pharmaceutical companies. Compliance teams must interpret evolving rules across jurisdictions, formats, and agencies, often manually, at high cost and risk of error. We introduce RegGuard, an industrial-scale AI assistant designed to automate the interpretation of heterogeneous regulatory texts and align them with internal corporate policies. The system ingests heterogeneous document sources through a secure pipeline and enhances retrieval and generation quality with two novel components: HiSACC (Hierarchical Semantic Aggregation for Contextual Chunking) semantically segments long documents into coherent units while maintaining consistency across non-contiguous sections. ReLACE (Regulatory Listwise Adaptive Cross-Encoder for Reranking), a domain-adapted cross-encoder built on an open-source model, jointly models user queries and retrieved candidates to improve ranking relevance. Evaluations in enterprise settings demonstrate that RegGuard improves answer quality specifically in terms of relevance, groundedness, and contextual focus, while significantly mitigating hallucination risk. The system architecture is built for auditability and traceability, featuring provenance tracking, access control, and incremental indexing, making it highly responsive to evolving document sources and relevant for any domain with stringent compliance demands.