🤖 AI Summary
Humanitarian response and conflict early warning require timely, accurate situational awareness, yet manual analysis of heterogeneous multi-source data (e.g., news, conflict databases, economic indicators) suffers from low efficiency and high latency. This paper proposes a dynamic Retrieval-Augmented Generation (RAG) system tailored for peacebuilding, enabling query-driven knowledge base construction and real-time fusion of multi-source data. It integrates semantic retrieval, fact-consistency verification, and an LLM-as-a-Judge evaluation mechanism. We introduce a novel three-tiered evaluation framework—combining NLP metrics, domain expert review, and LLM-based adjudication—to significantly improve report coherence and operational utility. In real-world deployment, the system reduces analysis cycle time by over 60% and substantially alleviates human workload. All code, datasets, and evaluation tools are publicly released, demonstrating the feasibility, robustness, and practical deployability of automated situational awareness.
📝 Abstract
Timely and accurate situation awareness is vital for decision-making in humanitarian response, conflict monitoring, and early warning and early action. However, the manual analysis of vast and heterogeneous data sources often results in delays, limiting the effectiveness of interventions. This paper introduces a dynamic Retrieval-Augmented Generation (RAG) system that autonomously generates situation awareness reports by integrating real-time data from diverse sources, including news articles, conflict event databases, and economic indicators. Our system constructs query-specific knowledge bases on demand, ensuring timely, relevant, and accurate insights. To ensure the quality of generated reports, we propose a three-level evaluation framework that combines semantic similarity metrics, factual consistency checks, and expert feedback. The first level employs automated NLP metrics to assess coherence and factual accuracy. The second level involves human expert evaluation to verify the relevance and completeness of the reports. The third level utilizes LLM-as-a-Judge, where large language models provide an additional layer of assessment to ensure robustness. The system is tested across multiple real-world scenarios, demonstrating its effectiveness in producing coherent, insightful, and actionable reports. By automating report generation, our approach reduces the burden on human analysts and accelerates decision-making processes. To promote reproducibility and further research, we openly share our code and evaluation tools with the community via GitHub.