Continuous Monitoring of Large-Scale Generative AI via Deterministic Knowledge Graph Structures

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Generative AI suffers from reliability issues—including hallucinations, semantic drift, and bias—exacerbated by its opaque “black-box” nature, which hinders scalable and transparent subjective human evaluation. To address this, we propose a dual-track, graph-based automated monitoring framework: (1) a deterministic knowledge graph constructed from domain ontologies, curated lexicons, and rule-based engines; and (2) a dynamic LLM-output graph generated in real time. By continuously comparing these graphs across structural and semantic metrics—including Instance Coverage Ratio (ICR), Instance Precision Ratio (IPR), and Conceptual Integrity (CI)—our method enables real-time detection of hallucinations and semantic drift. Crucially, it eliminates memory-induced evaluator bias, significantly enhancing objectivity, transparency, and scalability of assessment. The framework establishes a novel, quantifiable, and interpretable paradigm for continuous, trustworthy deployment of generative AI systems.

Technology Category

Application Category

📝 Abstract

Generative AI (GEN AI) models have revolutionized diverse application domains but present substantial challenges due to reliability concerns, including hallucinations, semantic drift, and inherent biases. These models typically operate as black-boxes, complicating transparent and objective evaluation. Current evaluation methods primarily depend on subjective human assessment, limiting scalability, transparency, and effectiveness. This research proposes a systematic methodology using deterministic and Large Language Model (LLM)-generated Knowledge Graphs (KGs) to continuously monitor and evaluate GEN AI reliability. We construct two parallel KGs: (i) a deterministic KG built using explicit rule-based methods, predefined ontologies, domain-specific dictionaries, and structured entity-relation extraction rules, and (ii) an LLM-generated KG dynamically derived from real-time textual data streams such as live news articles. Utilizing real-time news streams ensures authenticity, mitigates biases from repetitive training, and prevents adaptive LLMs from bypassing predefined benchmarks through feedback memorization. To quantify structural deviations and semantic discrepancies, we employ several established KG metrics, including Instantiated Class Ratio (ICR), Instantiated Property Ratio (IPR), and Class Instantiation (CI). An automated real-time monitoring framework continuously computes deviations between deterministic and LLM-generated KGs. By establishing dynamic anomaly thresholds based on historical structural metric distributions, our method proactively identifies and flags significant deviations, thus promptly detecting semantic anomalies or hallucinations. This structured, metric-driven comparison between deterministic and dynamically generated KGs delivers a robust and scalable evaluation framework.

Problem

Research questions and friction points this paper is trying to address.

Monitoring generative AI reliability via deterministic and LLM-generated knowledge graphs

Detecting semantic anomalies and hallucinations in real-time data streams

Establishing scalable evaluation framework using structural KG metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deterministic and LLM-generated Knowledge Graphs comparison

Real-time monitoring with dynamic anomaly thresholds

Established KG metrics for structural and semantic evaluation

🔎 Similar Papers

No similar papers found.