🤖 AI Summary
This work addresses the inefficiency of existing large language model agents, whose reliance on syntactically heavy state representations—such as nested JSON—leads to suboptimal context utilization, wasted attention resources, and impaired semantic reasoning. To overcome these limitations, the authors propose a novel document-driven multi-agent architecture that encodes agent states as Markdown documents aligned with pretraining corpora, thereby eliminating redundant parsing overhead. The framework integrates textual policy evolution, a semantic file system, and a closed-loop Watcher mechanism to enable knowledge accumulation and hallucination suppression without parameter updates. Evaluated on reasoning, retrieval, and coding benchmarks including HotPotQA, the approach significantly boosts performance, enabling Kimi-k2 to surpass GPT-4o and demonstrating the superiority of document-driven state modeling over conventional JSON-based representations.
📝 Abstract
The effectiveness of LLM-based agents is often limited not by model capacity alone, but by how efficiently contextual information is utilized at runtime. Existing agent frameworks rely on rigid, syntax-heavy state representations such as nested JSON, which require models to devote a substantial portion of their limited attention to syntactic processing rather than semantic reasoning. In this paper, we propose Fat-Cat, a document-driven agent architecture that improves the signal-to-noise ratio of state management. By integrating three key components: (1) a Semantic File System that represents agent state as Markdown documents aligned with common pre-training corpora, (2) a Textual Strategy Evolution module that accumulates task-solving knowledge without parameter updates, and (3) a Closed-Loop Watcher that monitors reasoning trajectories to reduce hallucinations. Extensive reasoning, retrieval, and coding benchmarks, Fat-Cat consistently improves agent performance. It enables the Kimi-k2 model to outperform the proprietary GPT-4o baseline on HotPotQA. Replacing the document-based state with JSON leads to performance drop, while empirically validating the critical necessity of document-driven state modeling over rigid syntax. The code is available at https://github.com/answeryt/Fat-Cat.