🤖 AI Summary
To address the challenge of balancing accuracy, latency, and cost in context construction for multi-hop question answering over knowledge graphs, this paper proposes a dynamic, learnable context engineering framework that models subgraph expansion as a sequential decision process under resource budget constraints. We introduce a novel tri-agent collaborative architecture—Subgraph Architect, Path Navigator, and Context Curator—that explicitly models query-level latency and prompt cost as configurable budgets. Furthermore, we propose LC-MAPPO, a resource-aware reinforcement learning algorithm enabling end-to-end, neural-symbolic, budget-constrained reasoning. Extensive evaluation on HotpotQA, MetaQA, and FactKG demonstrates significant improvements: on MetaQA-2hop, EM@1 increases by 39.3 points over GraphRAG, latency decreases by 18.6%, and edge growth reduces by 40.9%. Our approach markedly enhances context compactness, traceability, and performance predictability.
📝 Abstract
Knowledge graphs provide structured context for multi-hop question answering, but deployed systems must balance answer accuracy with strict latency and cost targets while preserving provenance. Static k-hop expansions and "think-longer" prompting often over-retrieve, inflate context, and yield unpredictable runtime. We introduce CLAUSE, an agentic three-agent neuro-symbolic framework that treats context construction as a sequential decision process over knowledge graphs, deciding what to expand, which paths to follow or backtrack, what evidence to keep, and when to stop. Latency (interaction steps) and prompt cost (selected tokens) are exposed as user-specified budgets or prices, allowing per-query adaptation to trade-offs among accuracy, latency, and cost without retraining. CLAUSE employs the proposed Lagrangian-Constrained Multi-Agent Proximal Policy Optimization (LC-MAPPO) algorithm to coordinate three agents: Subgraph Architect, Path Navigator, and Context Curator, so that subgraph construction, reasoning-path discovery, and evidence selection are jointly optimized under per-query resource budgets on edge edits, interaction steps, and selected tokens. Across HotpotQA, MetaQA, and FactKG, CLAUSE yields higher EM@1 while reducing subgraph growth and end-to-end latency at equal or lower token budgets. On MetaQA-2-hop, relative to the strongest RAG baseline (GraphRAG), CLAUSE achieves +39.3 EM@1 with 18.6% lower latency and 40.9% lower edge growth. The resulting contexts are compact, provenance-preserving, and deliver predictable performance under deployment constraints.