RS-Claw: Progressive Active Tool Exploration via Hierarchical Skill Trees for Remote Sensing Agents

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
Existing remote sensing agents struggle with long-horizon, complex tasks due to passive tool selection mechanisms that fail to balance contextual efficiency and tool completeness, often resulting in critical tool omissions or context overload. This work proposes an active exploration paradigm that encapsulates remote sensing tools within a hierarchical skill tree, enabling the agent to make staged decisions: first selecting a skill branch based on abstract summaries, then loading detailed descriptions only as needed for precise and efficient tool invocation. By introducing active exploration into remote sensing tool selection for the first time, the method achieves an 86% input token compression rate on Earth-Bench, substantially reducing semantic noise and outperforming both flat and retrieval-augmented generation (RAG) baselines across complex reasoning tasks.
📝 Abstract
The rise of multi-modal large language models (MLLMs) is shifting remote sensing (RS) intelligence from "see" to "action", as OpenClaw-style frameworks enable agents to autonomously operate massive RS image-processing tools for complex tasks. Existing RS agents adopt a passive selection paradigm for tool invocation, relying on either full tool registration (Flat) or retrieval-augmented generation (RAG). However, in the massive and multi-source heterogeneous RS tool ecosystem, such passive mechanisms struggle to dynamically balance "context load" and "toolset completeness" throughout task reasoning, thus exhibiting inherent limitations: full tool registration triggers context space deficits during long-horizon tasks, whereas RAG retrieval may omit critical tools in essential steps. To overcome these bottlenecks, this paper redefines tool selection by arguing that the agent should act as an active explorer within the tool space. Based on this perspective, we propose RS-Claw, a novel RS agent architecture. By leveraging Skill encapsulation technology at the tool end, this architecture hierarchically structures tool descriptions, enabling the agent to execute on-demand sequential decision-making: initially selecting relevant skill branches by reading only tool summaries, then dynamically loading detailed descriptions, and ultimately achieving precise invocation. This active paradigm not only significantly liberates the agent's context space but also effectively ensures the accurate hit rate of critical tools during long-horizon reasoning. Systematic experiments on the Earth-Bench benchmark demonstrate that RS-Claw's active exploration mechanism effectively filters semantic noise and substantially frees up reasoning space, achieving an input token compression ratio of up to 86%, and comprehensively outperforming existing Flat and RAG baselines across complex reasoning evaluations.
Problem

Research questions and friction points this paper is trying to address.

remote sensing agents
tool selection
context load
toolset completeness
multi-modal large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

active tool exploration
hierarchical skill trees
skill encapsulation
context compression
remote sensing agents
L
Liangtian Liu
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
Zeyuan Wang
Zeyuan Wang
PhD, The University of Sydney
NLPMedical Informatics
Ziyu Li
Ziyu Li
Philips I&D Data & AI
Knowledge ExtractionQuery OptimizationMachine LearningGraph
K
Kai Ouyang
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
Z
Zichao Tang
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
C
Chengfu Liu
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
Haifeng Li
Haifeng Li
Central South University
GISRemote sensingMachine learningSparse represetationBrain Theory
H
Hanwen Yu
School of Resources and Environment, University of Electronic Science and Technology of China, Xi'an 710071, China
W
Wentao Yang
School of Earth Sciences and Spatial Information Engineering, Hunan University of Science and Technology, Xiangtan 411201, China; and Sanya Institute of Hunan University of Science and Technology, Sanya 572024, China
C
Cheng Yang
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China
D
Dongyang Hou
School of Geosciences and Info-Physics, Central South University, Changsha 410083, China