Curriculum-Based Multi-Tier Semantic Exploration via Deep Reinforcement Learning

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional reinforcement learning struggles to balance efficient exploration with deep semantic understanding in complex, unknown environments, primarily due to the limited cognitive capacity of policy networks and frequent reliance on human intervention. To address this, we propose an embodied semantic exploration framework endowed with high-level cognitive capabilities. Our method introduces a hierarchical reward mechanism to guide multi-stage decision-making, designs a vision-language model (VLM)-driven query action module for dynamic external commonsense retrieval, and incorporates curriculum learning for progressive capability acquisition. The approach unifies deep reinforcement learning, VLM-based commonsense reasoning, and structured reward engineering. Experimental results demonstrate substantial improvements in object discovery rates, autonomous navigation to semantically rich regions, and learned strategic invocation of VLM queries—enabling resource-efficient, commonsense-augmented exploration.

Technology Category

Application Category

📝 Abstract
Navigating and understanding complex and unknown environments autonomously demands more than just basic perception and movement from embodied agents. Truly effective exploration requires agents to possess higher-level cognitive abilities, the ability to reason about their surroundings, and make more informed decisions regarding exploration strategies. However, traditional RL approaches struggle to balance efficient exploration and semantic understanding due to limited cognitive capabilities embedded in the small policies for the agents, leading often to human drivers when dealing with semantic exploration. In this paper, we address this challenge by presenting a novel Deep Reinforcement Learning (DRL) architecture that is specifically designed for resource efficient semantic exploration. A key methodological contribution is the integration of a Vision-Language Model (VLM) common-sense through a layered reward function. The VLM query is modeled as a dedicated action, allowing the agent to strategically query the VLM only when deemed necessary for gaining external guidance, thereby conserving resources. This mechanism is combined with a curriculum learning strategy designed to guide learning at different levels of complexity to ensure robust and stable learning. Our experimental evaluation results convincingly demonstrate that our agent achieves significantly enhanced object discovery rates and develops a learned capability to effectively navigate towards semantically rich regions. Furthermore, it also shows a strategic mastery of when to prompt for external environmental information. By demonstrating a practical and scalable method for embedding common-sense semantic reasoning with autonomous agents, this research provides a novel approach to pursuing a fully intelligent and self-guided exploration in robotics.
Problem

Research questions and friction points this paper is trying to address.

Balancing efficient exploration with semantic understanding in robotics
Overcoming limited cognitive capabilities in traditional RL approaches
Integrating Vision-Language Models for resource-efficient semantic exploration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Vision-Language Model via layered reward
Uses curriculum learning for complexity progression
Models VLM query as strategic dedicated action
🔎 Similar Papers
No similar papers found.
A
Abdel Hakim Drid
Department of Electrical Engineering - Mohamed Khider, University of Biskra, Biskra (Algeria)
Vincenzo Suriani
Vincenzo Suriani
Sapienza University of Rome
Daniele Nardi
Daniele Nardi
Sapienza Univ. Roma, Dept. Computer, Control and Management Engineering
Artificial IntelligenceRoboticsMulti Agent Systems
A
Abderrezzak Debilou
Department of Electrical Engineering - Mohamed Khider, University of Biskra, Biskra (Algeria)