🤖 AI Summary
This work addresses the lack of explicit modeling in existing reinforcement learning methods for transferring symbolic observations from short-term to long-term memory in partially observable environments. It formalizes this transfer process for the first time as a learnable neuro-symbolic value-based decision problem, wherein an entry-based Q-learning mechanism dynamically determines whether to store observed triples into a capacity-constrained long-term memory. The approach integrates shared parameters and cross-step temporal difference updates to handle variable-sized short-term buffers. The proposed lightweight local short-term memory architecture significantly outperforms both symbolic and neural baselines on the RoomKG benchmark, effectively retaining navigation- and query-relevant facts while proactively discarding low-value information under a 128-unit memory budget, thereby achieving interpretable and efficient memory management.
📝 Abstract
Reinforcement learning under partial observability requires deciding what information to retain, yet most memory-based approaches do not explicitly model short-term-to-long-term transfer of symbolic observations. We study this transfer process in a temporal knowledge-graph memory setting and cast it as a neuro-symbolic value-based decision problem: for each observed triple, the agent chooses whether to keep or drop it before long-term insertion. To handle variable-sized short-term buffers, we use a per-item Q-learning design with shared parameters and a practical temporal-difference update over matched items across consecutive steps. On the RoomKG benchmark at long-term memory capacity 128, learned transfer decisions outperform symbolic and neural baselines, including symbolic baselines with temporal annotations and history-based LSTM/Transformer baselines. Across transfer-policy ablations, a lightweight local short-term-only variant performs best, and step-level behavior shows that the policy keeps navigation- and query-relevant facts while discarding lower-value candidate facts, supporting explicit and interpretable memory decisions under memory constraints.