π€ AI Summary
This work addresses the limitation of traditional judge agents, which evaluate query-response pairs in isolation and thus fail to capture cross-instance inconsistencies that hinder the reasoning optimization of the main agent. To overcome this, we propose the Judge Agent Forest (JAF) framework, which elevates judge agents from local evaluators to global learners by jointly reasoning over related query-response pairs. JAF integrates belief propagation with ensemble learning to construct a contextual neighborhood knowledge graph and introduces an interpretable, relation-aware mechanism for diverse exemplar selection, surpassing the constraints of conventional kNN-based embedding approaches. By synergistically combining in-context learning (ICL), locality-sensitive hashing (LSH), semantic embeddings, LLM-driven hash predicates, and label supervision, JAF significantly enhances the main agentβs ability to refine its reasoning pathways through collective feedback, as demonstrated in large-scale cloud misconfiguration classification tasks.
π Abstract
Judge agents are fundamental to agentic AI frameworks: they provide automated evaluation, and enable iterative self-refinement of reasoning processes. We introduce JAF: Judge Agent Forest, a framework in which the judge agent conducts joint inference across a cohort of query--response pairs generated by a primary agent, rather than evaluating each in isolation. This paradigm elevates the judge from a local evaluator to a holistic learner: by simultaneously assessing related responses, the judge discerns cross-instance patterns and inconsistencies, whose aggregate feedback enables the primary agent to improve by viewing its own outputs through the judge's collective perspective. Conceptually, JAF bridges belief propagation and ensemble-learning principles: overlapping in-context neighborhoods induce a knowledge-graph structure that facilitates propagation of critique, and repeated, randomized evaluations yield a robust ensemble of context-sensitive judgments. JAF can be instantiated entirely via ICL, with the judge prompted for each query using its associated primary-agent response plus a small, possibly noisy set of peer exemplars. While kNN in embedding space is a natural starting point for exemplars, this approach overlooks categorical structure, domain metadata, or nuanced distinctions accessible to modern LLMs. To overcome these limitations, we develop a flexible locality-sensitive hashing (LSH) algorithm that learns informative binary codes by integrating semantic embeddings, LLM-driven hash predicates, supervision from categorical labels, and relevant side information. These hash codes support efficient, interpretable, and relation-aware selection of diverse exemplars, and further optimize exploration of CoT reasoning paths. We validate JAF with an empirical study on the demanding task of cloud misconfigs triage in large-scale cloud environments.