🤖 AI Summary
To address inherent limitations of large language models (LLMs) in complex instruction following, hallucination suppression, and spatial reasoning, this paper proposes a collaborative multi-agent framework. The core method introduces a synergistic compositional environment representation learning mechanism, integrating medium-scale LLM agents, dynamic high-level semantic abstraction, discourse parsing, and active clarification modules to enable instruction decomposition, environment reconstruction, and cooperative reasoning. This mechanism generalizes to non-embodied tasks such as API invocation and supports continuous cognitive state updating. Experiments on natural collaborative construction tasks demonstrate substantial improvements over larger monolithic LLM chain-of-thought baselines and existing agent systems: hallucination rates decrease by 42%, task completion increases by 31%, and interaction reliability is significantly enhanced.
📝 Abstract
We present COCORELI, a hybrid agent framework designed to tackle the limitations of large language models (LLMs) in tasks requiring: following complex instructions, minimizing hallucination, and spatial reasoning. COCORELI integrates medium-sized LLM agents with novel abstraction mechanisms and a discourse module to parse instructions to in-context learn dynamic, high-level representations of the environment. Experiments on natural collaborative construction tasks show that COCORELI outperforms single-LLM CoT and agentic LLM systems, all using larger LLMs. It manages to largely avoid hallucinations, identify missing information, ask for clarifications, and update its learned objects. COCORELI's abstraction abilities extend beyond ENVIRONMENT, as shown in the ToolBench API completion task.