🤖 AI Summary
This work addresses the risk of unauthorized content dissemination in multi-agent knowledge ecosystems arising from semantic publish-subscribe mechanisms that overlook data governance rules. The authors propose a novel subscription framework that integrates semantic vector matching with multidimensional legal compliance policies—including processing levels, direct marketing restrictions, training opt-outs, jurisdictional constraints, and research-purpose limitations. For the first time, regulatory requirements from the EU’s DSM Directive and AI Act are encoded as policy predicates embedded directly into the vector retrieval pipeline, ensuring notifications are triggered only when both semantic similarity thresholds and all applicable policy constraints are simultaneously satisfied. Implemented within the AIngram system and evaluated on the PASA benchmark and a synthetic corpus (1,000 text blocks, 93 subscriptions, 5 domains), the approach guarantees compliant delivery of authorized content without violating any governance rule, while ablation studies confirm that no single policy dimension alone suffices for full regulatory compliance.
📝 Abstract
As AI agent ecosystems grow, agents need mechanisms to monitor relevant knowledge in real time. Semantic publish-subscribe systems address this by matching new content against vector subscriptions. However, in multi-agent settings where agents operate under different data handling policies, unrestricted semantic subscriptions create policy violations: agents receive notifications about content they are not authorized to access. We introduce governance-aware vector subscriptions, a mechanism that composes semantic similarity matching with multi-dimensional policy predicates grounded in regulatory frameworks (EU DSM Directive, EU AI Act). The policy predicate operates over multiple independent dimensions (processing level, direct marketing restrictions, training opt-out, jurisdiction, and scientific usage) each with distinct legal bases. Agents subscribe to semantic regions of a curated knowledge base; notifications are dispatched only for validated content that passes both the similarity threshold and all applicable policy constraints. We formalize the mechanism, implement it within AIngram (an operational multi-agent knowledge base), and evaluate it using the PASA benchmark. We validate the mechanism on a synthetic corpus (1,000 chunks, 93 subscriptions, 5 domains): the governed mode correctly enforces all policy constraints while preserving delivery of authorized content. Ablation across five policy dimensions shows that no single dimension suffices for full compliance.