COSINT-Agent: A Knowledge-Driven Multimodal Agent for Chinese Open Source Intelligence

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key bottlenecks in Chinese open-source intelligence (OSINT)—including difficulty in fusing heterogeneous multimodal data, weak contextual reasoning, and low operational utility—this paper proposes EES-Match, a novel framework integrating fine-grained multimodal large language models (COSINT-MLLM) with a dynamic, knowledge-driven Entity-Event-Scene knowledge graph (EES-KG). The framework establishes a closed-loop, knowledge-augmented reasoning paradigm that bridges multimodal perception to structured decision-making. Core technical contributions include: (1) construction and efficient retrieval of EES-KG; (2) cross-modal alignment and matching; and (3) a structured reasoning engine. Experimental results demonstrate that EES-Match significantly outperforms state-of-the-art methods on Chinese OSINT tasks—including entity recognition, EES-triple generation, and contextual matching—while achieving high accuracy, strong generalizability across domains, and inherent interpretability.

Technology Category

Application Category

📝 Abstract
Open Source Intelligence (OSINT) requires the integration and reasoning of diverse multimodal data, presenting significant challenges in deriving actionable insights. Traditional approaches, including multimodal large language models (MLLMs), often struggle to infer complex contextual relationships or deliver comprehensive intelligence from unstructured data sources. In this paper, we introduce COSINT-Agent, a knowledge-driven multimodal agent tailored to address the challenges of OSINT in the Chinese domain. COSINT-Agent seamlessly integrates the perceptual capabilities of fine-tuned MLLMs with the structured reasoning power of the Entity-Event-Scene Knowledge Graph (EES-KG). Central to COSINT-Agent is the innovative EES-Match framework, which bridges COSINT-MLLM and EES-KG, enabling systematic extraction, reasoning, and contextualization of multimodal insights. This integration facilitates precise entity recognition, event interpretation, and context retrieval, effectively transforming raw multimodal data into actionable intelligence. Extensive experiments validate the superior performance of COSINT-Agent across core OSINT tasks, including entity recognition, EES generation, and context matching. These results underscore its potential as a robust and scalable solution for advancing automated multimodal reasoning and enhancing the effectiveness of OSINT methodologies.
Problem

Research questions and friction points this paper is trying to address.

Integrates multimodal data for actionable intelligence in OSINT.
Overcomes challenges in inferring complex contextual relationships.
Enhances entity recognition and event interpretation in Chinese OSINT.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates fine-tuned MLLMs with EES-KG
Uses EES-Match framework for multimodal insights
Enhances entity recognition and event interpretation
W
Wentao Li
School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China
Congcong Wang
Congcong Wang
School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China
Xiaoxiao Cui
Xiaoxiao Cui
Shandong University
Z
Zhi Liu
School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China
W
Wei Guo
School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, China
Lizhen Cui
Lizhen Cui
Shandong University
DatabasesBig DataArtificial IntelligenceData MiningCloud computing