π€ AI Summary
This work addresses the vulnerability of natural language communication in multi-agent systems to semantic drift and the lack of interpretability and semantic consistency in existing learning protocols. The authors propose the first communication framework that supports verifiable semantics, grounded in a stimulus-meaning model. It introduces a terminology certification protocol that validates agentsβ shared understanding of terms through jointly observable events and incorporates a core guarded reasoning mechanism that permits only certified terms in inference. By integrating statistical hypothesis testing, drift detection, and vocabulary renegotiation, the framework provides bounded guarantees on semantic divergence. Experiments demonstrate a 72%β96% reduction in semantic divergence in simulated environments and a 51% reduction when fine-tuning language models, significantly enhancing communication consistency and reliability.
π Abstract
Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents restricting their reasoning to certified terms ("core-guarded reasoning") achieve provably bounded disagreement. We also outline mechanisms for detecting drift (recertification) and recovering shared vocabulary (renegotiation). In simulations with varying degrees of semantic divergence, core-guarding reduces disagreement by 72-96%. In a validation with fine-tuned language models, disagreement is reduced by 51%. Our framework provides a first step towards verifiable agent-to-agent communication.