đ¤ AI Summary
Current large language models (LLMs) suffer from static, parameterized knowledge, limiting their ability to access real-time information and reason about the physical world. To address this, we propose the *embodied reasoning* paradigm, dynamically grounding LLM inference in real-world environments. Our method introduces an adaptive reasoning mechanism that integrates intrinsic knowledge with heterogeneous external interfacesâincluding knowledge bases, structured tables, and textual environmentsâalongside a reinforcement learningâdriven framework for active perception, interactive engagement, and feedback utilization. We further design a unified architecture supporting dynamic tool invocation and reflective reasoning. Empirically, our approach achieves significant performance gains on multi-hop question answering and mathematical reasoning benchmarks. Moreover, it successfully generalizes to unseen tasksâincluding knowledge base QA (KBQA), table-based QA (TableQA), and text-based gamesâdemonstrating, for the first time, systematic evidence of LLMsâ embodied reasoning capability in real-world settings.
đ Abstract
Recent advances in large language models (LLMs) demonstrate their impressive reasoning capabilities. However, the reasoning confined to internal parametric space limits LLMs' access to real-time information and understanding of the physical world. To overcome this constraint, we introduce SituatedThinker, a novel framework that enables LLMs to ground their reasoning in real-world contexts through situated thinking, which adaptively combines both internal knowledge and external information with predefined interfaces. By utilizing reinforcement learning, SituatedThinker incentivizes deliberate reasoning with the real world to acquire information and feedback, allowing LLMs to surpass their knowledge boundaries and enhance reasoning. Experimental results demonstrate significant performance improvements on multi-hop question-answering and mathematical reasoning benchmarks. Furthermore, SituatedThinker demonstrates strong performance on unseen tasks, such as KBQA, TableQA, and text-based games, showcasing the generalizable real-world grounded reasoning capability. Our codes are available at https://github.com/jnanliu/SituatedThinker.