🤖 AI Summary
Current vision-language models struggle to distinguish between semantic similarity and factual existence and lack the ability to explicitly represent negation constraints, leading to unverifiable multimodal reasoning. This work proposes a training-free polarized implicit graph memory mechanism that, for the first time, explicitly models negated facts as core cognitive states. By leveraging non-parametric distribution partitioning and a polarized graph structure with inhibitory connections, the method transforms ambiguous perceptual likelihoods into discrete logical constraints, enabling logic-driven, verifiable retrieval. Experiments across eight frozen vision-language models and six benchmarks demonstrate that the approach significantly suppresses hallucinations and enhances the verifiability and robustness of multimodal agent reasoning.
📝 Abstract
As multimodal agents evolve from passive observers to long-horizon decision-makers, they require memory systems that provide not just information availability but logical verifiability. A fundamental limitation of current architectures is the epistemic asymmetry inherent in probabilistic vision-language models and dense associative memories: they conflate semantic affinity with factual existence and structurally fail to encode negative constraints. To this end, we introduce PolarMem, a training-free Polarized Latent Graph Memory designed to ground agent reasoning in verifiable evidence. PolarMem transforms fuzzy perceptual likelihoods into discrete logical constraints through non-parametric distributional partitioning. Furthermore, it employs a polarized graph topology with orthogonal inhibitory connections to explicitly store verified negation as a primary cognitive state. At inference time, we enforce a logic-dominant retrieval paradigm, suppressing hallucinatory patterns that violate negative constraints. Extensive evaluation across eight frozen Vision--Language Models and six benchmarks demonstrates that PolarMem functions as a robust cognitive system, establishing a foundation for verifiable multimodal agents. Our code is available at https://github.com/czs-ict/PolarMem.