🤖 AI Summary
To address key bottlenecks in Chinese open-source intelligence (OSINT)—including difficulty in fusing heterogeneous multimodal data, weak contextual reasoning, and low operational utility—this paper proposes EES-Match, a novel framework integrating fine-grained multimodal large language models (COSINT-MLLM) with a dynamic, knowledge-driven Entity-Event-Scene knowledge graph (EES-KG). The framework establishes a closed-loop, knowledge-augmented reasoning paradigm that bridges multimodal perception to structured decision-making. Core technical contributions include: (1) construction and efficient retrieval of EES-KG; (2) cross-modal alignment and matching; and (3) a structured reasoning engine. Experimental results demonstrate that EES-Match significantly outperforms state-of-the-art methods on Chinese OSINT tasks—including entity recognition, EES-triple generation, and contextual matching—while achieving high accuracy, strong generalizability across domains, and inherent interpretability.
📝 Abstract
Open Source Intelligence (OSINT) requires the integration and reasoning of diverse multimodal data, presenting significant challenges in deriving actionable insights. Traditional approaches, including multimodal large language models (MLLMs), often struggle to infer complex contextual relationships or deliver comprehensive intelligence from unstructured data sources. In this paper, we introduce COSINT-Agent, a knowledge-driven multimodal agent tailored to address the challenges of OSINT in the Chinese domain. COSINT-Agent seamlessly integrates the perceptual capabilities of fine-tuned MLLMs with the structured reasoning power of the Entity-Event-Scene Knowledge Graph (EES-KG). Central to COSINT-Agent is the innovative EES-Match framework, which bridges COSINT-MLLM and EES-KG, enabling systematic extraction, reasoning, and contextualization of multimodal insights. This integration facilitates precise entity recognition, event interpretation, and context retrieval, effectively transforming raw multimodal data into actionable intelligence. Extensive experiments validate the superior performance of COSINT-Agent across core OSINT tasks, including entity recognition, EES generation, and context matching. These results underscore its potential as a robust and scalable solution for advancing automated multimodal reasoning and enhancing the effectiveness of OSINT methodologies.