π€ AI Summary
In partially observable environments, long interaction histories pose challenges for effective compression while preserving optimality in decision-making. To address this, we propose Descriptive Historical Representations (DHRs)βthe first sufficient statistical representation formally defined by the principle of *answerability*. Methodologically, we design a multi-agent framework that jointly trains representation learning, policy optimization, and question generation, integrating question-driven representation learning, text generation, and user modeling. Experiments on public movie and shopping datasets demonstrate that the generated textual user profiles significantly improve preference behavior prediction accuracy (+8.2%β12.7%) and exhibit strong interpretability and task-oriented utility. Our core contribution is establishing answerability as a novel paradigm for representation sufficiency, enabling joint optimization of historical compression, decision control, and human-interpretable interaction.
π Abstract
Effective decision making in partially observable environments requires compressing long interaction histories into informative representations. We introduce Descriptive History Representations (DHRs): sufficient statistics characterized by their capacity to answer relevant questions about past interactions and potential future outcomes. DHRs focus on capturing the information necessary to address task-relevant queries, providing a structured way to summarize a history for optimal control. We propose a multi-agent learning framework, involving representation, decision, and question-asking components, optimized using a joint objective that balances reward maximization with the representation's ability to answer informative questions. This yields representations that capture the salient historical details and predictive structures needed for effective decision making. We validate our approach on user modeling tasks with public movie and shopping datasets, generating interpretable textual user profiles which serve as sufficient statistics for predicting preference-driven behavior of users.