Scholar

Wenzhe Cai

Google Scholar ID: NHQcCyAAAAAJ

Shanghai AI Laboratory

Reinforcement LearningVisual NavigationRobotics

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

397

H-index

i10-index

Publications

Co-authors

Contact

Emailwz_cai@seu.edu.cn GitHubOpen ↗

Publications

10 items

Cortex: A Bidirectionally Aligned Embodied Agent Framework for Long-horizon Manipulation

2026

Cited

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

2026

Cited

LabBuilder: Protocol-Grounded 3D Layout Generation for Interactable and Safe Laboratory

2026

Cited

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

2025

Cited

ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

2025

Cited

Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation

2025

Cited

NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions

2025

Cited

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

2025

Cited

Resume (English only)

Academic Achievements

InternVLA-N1: The first open dual-system vision-language navigation foundation model.
InternScenes: A large-scale interactive indoor scene dataset comprising approximately 40,000 diverse scenes.
NavDP: Learning sim-to-real navigation diffusion policy with privileged information guidance.
StreamVLN: A streaming VLN framework that employs a hybrid slow-fast context modeling strategy to support multi-modal reasoning over interleaved vision, language, and action inputs.
ImagineNav: A novel navigation decision framework using imagination to generate candidate future images and let VLMs select.
Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation with Open-Sourced LLM: A hierarchical reinforcement learning method using efficient open-sourced LLMs as a high-level planner and an RL-based policy for sub-instruction accomplishment.
InstructNav: A zero-shot system for generic instruction navigation in unexplored environments.
MO-DDN: A coarse-to-fine attribute-based exploration agent for multi-object demand-driven navigation.

Research Experience

Researcher at Shanghai AI Laboratory, working closely with Dr. Tai Wang and Dr. Jiangmiao Pang.

Education

Ph.D. from Southeast University, advised by Prof. Changyin Sun; Visiting student at Peking University, advised by Prof. Hao Dong.

Background

Research Interests: Embodied AI, especially on building intelligent robots that can comprehend diverse language instructions and exhibit adaptive navigation behaviors in the dynamic open world. Specializations: Embodied AI, Visual Navigation, and Deep Reinforcement Learning.

Miscellany

Contact: Email / Google Scholar / Github

Co-authors

0 total

Co-authors: 0 (list not available)