InternVLA-N1: The first open dual-system vision-language navigation foundation model.
InternScenes: A large-scale interactive indoor scene dataset comprising approximately 40,000 diverse scenes.
NavDP: Learning sim-to-real navigation diffusion policy with privileged information guidance.
StreamVLN: A streaming VLN framework that employs a hybrid slow-fast context modeling strategy to support multi-modal reasoning over interleaved vision, language, and action inputs.
ImagineNav: A novel navigation decision framework using imagination to generate candidate future images and let VLMs select.
Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation with Open-Sourced LLM: A hierarchical reinforcement learning method using efficient open-sourced LLMs as a high-level planner and an RL-based policy for sub-instruction accomplishment.
InstructNav: A zero-shot system for generic instruction navigation in unexplored environments.
MO-DDN: A coarse-to-fine attribute-based exploration agent for multi-object demand-driven navigation.
Research Experience
Researcher at Shanghai AI Laboratory, working closely with Dr. Tai Wang and Dr. Jiangmiao Pang.
Education
Ph.D. from Southeast University, advised by Prof. Changyin Sun; Visiting student at Peking University, advised by Prof. Hao Dong.
Background
Research Interests: Embodied AI, especially on building intelligent robots that can comprehend diverse language instructions and exhibit adaptive navigation behaviors in the dynamic open world. Specializations: Embodied AI, Visual Navigation, and Deep Reinforcement Learning.