🤖 AI Summary
This work addresses the poor generalization of reinforcement learning agents in out-of-distribution (OOD) scenarios, where high decision uncertainty degrades performance. The authors propose the ASK framework, which integrates a pretrained reinforcement learning policy with a small language model. By leveraging Monte Carlo Dropout to quantify epistemic uncertainty, the framework invokes the language model only when uncertainty exceeds a predefined threshold, thereby obtaining action suggestions on demand. This selective querying mechanism avoids indiscriminate fusion of neural and symbolic components, enhancing the synergy and efficiency of the neuro-symbolic system. Experimental results on the FrozenLake environment demonstrate that the approach maintains strong in-distribution performance while significantly improving OOD transfer, achieving a reward of 0.95 and confirming its effectiveness and robustness.
📝 Abstract
Reinforcement learning (RL) agents often struggle with out-of-distribution (OOD) scenarios, leading to high uncertainty and random behavior. While language models (LMs) contain valuable world knowledge, larger ones incur high computational costs, hindering real-time use, and exhibit limitations in autonomous planning. We introduce Adaptive Safety through Knowledge (ASK), which combines smaller LMs with trained RL policies to enhance OOD generalization without retraining. ASK employs Monte Carlo Dropout to assess uncertainty and queries the LM for action suggestions only when uncertainty exceeds a set threshold. This selective use preserves the efficiency of existing policies while leveraging the language model's reasoning in uncertain situations. In experiments on the FrozenLake environment, ASK shows no improvement in-domain, but demonstrates robust navigation in transfer tasks, achieving a reward of 0.95. Our findings indicate that effective neuro-symbolic integration requires careful orchestration rather than simple combination, highlighting the need for sufficient model scale and effective hybridization mechanisms for successful OOD generalization.