HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current Vision-and-Language Navigation (VLN) systems are largely confined to either discrete or continuous paradigms, limiting their ability to handle social interactions in dynamic, multi-pedestrian environments. To address this, we propose the first unified framework that explicitly models human social intent while jointly leveraging discrete and continuous navigation. Our method introduces a discrete-continuous cooperative task formulation and incorporates personal space constraints. We release HAPS 2.0—a large-scale human motion dataset—and an enhanced simulator; construct a human-centered instruction evaluation benchmark (16,844 instances); and validate real-world transfer in crowded physical settings. The approach integrates multi-agent simulation, motion-language alignment learning, partially observable reinforcement learning, and social-distance-aware planning. Experiments demonstrate significant improvements in navigation success rate and substantial reductions in collision frequency, confirming the critical role of social context modeling for safe navigation. All data, code, and evaluation tools are publicly released to advance standardization in human-centered VLN.

Technology Category

Application Category

📝 Abstract

Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone, overlooking the complexities of human-populated, dynamic environments. We introduce a unified Human-Aware VLN (HA-VLN) benchmark that merges these paradigms under explicit social-awareness constraints. Our contributions include: 1. A standardized task definition that balances discrete-continuous navigation with personal-space requirements; 2. An enhanced human motion dataset (HAPS 2.0) and upgraded simulators capturing realistic multi-human interactions, outdoor contexts, and refined motion-language alignment; 3. Extensive benchmarking on 16,844 human-centric instructions, revealing how multi-human dynamics and partial observability pose substantial challenges for leading VLN agents; 4. Real-world robot tests validating sim-to-real transfer in crowded indoor spaces; and 5. A public leaderboard supporting transparent comparisons across discrete and continuous tasks. Empirical results show improved navigation success and fewer collisions when social context is integrated, underscoring the need for human-centric design. By releasing all datasets, simulators, agent code, and evaluation tools, we aim to advance safer, more capable, and socially responsible VLN research.

Problem

Research questions and friction points this paper is trying to address.

Integrates discrete-continuous navigation with human-aware constraints.

Addresses challenges in dynamic, multi-human populated environments.

Validates sim-to-real transfer in crowded indoor spaces.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified benchmark merging discrete-continuous navigation paradigms

Enhanced human motion dataset capturing realistic interactions

Real-world robot tests validating sim-to-real transfer

🔎 Similar Papers

Hypergraph-based Coordinated Task Allocation and Socially-aware Navigation for Multi-Robot Systems

2024-09-17Citations: 0

Authors to Follow