VeriTrace: Evolving Mental Models for Deep Research Agents

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing agents struggle with massive, interdependent, and highly uncertain information, as their intermediate representations are prone to contamination by low-quality inputs and lack explicit regulatory mechanisms. This work proposes a cognitive evolution framework that explicitly models the agent’s mental model evolution through a tripartite regulatory loop comprising explanation updating, bias feedback, and schema correction—thereby replacing reliance on large model scale as a proxy for regulation. Implemented within the VeriTrace cognitive graph framework and integrated with large language models such as Qwen3.5-27B or Config-DeepSeek, the approach achieves state-of-the-art reproducible results: it outperforms the strongest baseline by 4.22 percentage points (1.49 pp overall gain) on the DeepResearch Bench insight task and attains a 5.9-percentage-point win rate improvement on DeepConsult.

📝 Abstract

Deep research agents face vast, interdependent, and pervasively uncertain information. Existing systems explore what evolving intermediate representations should look like, but leave their evolution to the LLM's implicit reasoning. Without explicit regulation, the intermediate layer is easily contaminated by mixed-quality information and propagates errors along its dependencies, so model scale often ends up substituting for absent regulation. We argue that an agent's mental model should instead evolve through explicit feedback that continuously aligns task understanding with reality, and identify three regulatory loops: interpretive update, deviation feedback, and schema revision. We realise this in VeriTrace, a cognitive-graph framework that explicitly implements the three loops. Using matched Qwen3.5-27B backbones, VeriTrace improves over the strongest matched baseline by 4.22 pp on DeepResearch Bench (DRB) Insight (1.49 pp Overall) and by 5.9 pp Overall win rate on DeepConsult. With Config-DeepSeek, it achieves the strongest reproducible open-source result on DRB.

Problem

Research questions and friction points this paper is trying to address.

deep research agents

mental models

information uncertainty

error propagation

intermediate representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

mental model

regulatory loops

cognitive graph