π€ AI Summary
This work addresses the challenge that large language models, lacking private working memory, struggle to simultaneously ensure confidentiality of hidden states and consistency in responses during interactive tasks. We propose the Private State Interactive Tasks (PSITs) framework, formally defining this problem for the first time and theoretically proving that agents relying solely on dialogue history cannot satisfy both confidentiality and consistency requirements. To overcome this limitation, we design an explicit private working memory architecture and introduce a self-consistency testing protocol alongside a forked dialogue evaluation methodology. Experimental results demonstrate that standard chat models and retrieval-based memory approaches fail consistency tests, whereas agents equipped with private working memory effectively maintain cross-branch dialogue consistency, thereby surpassing current limitations.
π Abstract
As LLMs move from text completion toward autonomous agents, they remain constrained by the standard chat interface, which lacks private working memory. This raises a fundamental question: can agents reliably perform interactive tasks that depend on hidden state? We define Private State Interactive Tasks (PSITs), which require agents to generate and maintain hidden information while producing consistent public responses. We show theoretically that any agent restricted to the public conversation history cannot simultaneously preserve secrecy and consistency in PSITs, yielding an impossibility theorem. To empirically validate this limitation, we introduce a self-consistency testing protocol that evaluates whether agents can maintain a hidden secret across forked dialogue branches. Standard chat-based LLMs and retrieval-based memory baselines fail this test regardless of scale, demonstrating that semantic retrieval does not enable true state maintenance. To address this, we propose a novel architecture incorporating an explicit private working memory; we demonstrate that this mechanism restores consistency, establishing private state as a necessary component for interactive language agents.