Scholar

Dongming Wu

Google Scholar ID: ejFCAq0AAAAJ

MMLab, CUHK; CPII

Computer VisionVision and LanguageMLLMEmbodied AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,071

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailwudongming97@gmail.com GitHubOpen ↗

Publications

9 items

From Web to Pixels: Bringing Agentic Search into Visual Perception

2026

Cited

Towards Autonomous Business Intelligence via Data-to-Insight Discovery Agent

2026

Cited

AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild

2026

Cited

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

2025

Cited

RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

2025

Cited

Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding

2025

Cited

Cognitive Disentanglement for Referring Multi-Object Tracking

2025

Cited

Bootstrapping Referring Multi-Object Tracking

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

Publications: ICCV 2025 - RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark; CVPR 2025 - DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation; AAAI 2025 - Language prompt for autonomous driving; ECCV 2024 - Merlin: Empowering Multimodal LLMs; ICLR 2024 - TopoMLP; Preprint Papers: Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding; Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?; Bootstrapping Referring Multi-Object Tracking. Awards: 2025.06 Outstanding Graduates of Beijing; 2024.05 Excellent Doctoral Thesis Seedling Fund.

Research Experience

Dexmal Research Intern, Mentor: Yingfei Liu and Tiancai Wang; MBZUAI Visiting Student, Mentor: Prof. Rao Muhammad Anwer and Prof. Fahad Shahbaz Khan; MEGVII Research Intern, Mentor: Tiancai Wang and Xiangyu Zhang; IIAI Research Intern, Mentor: Xingping Dong and Prof. Ling Shao.

Education

2025.06: Ph.D. in Department of Computer Science, Beijing Institute of Technology, advised by Prof. Jianbing Shen; 2019.06: Bachelor degree from the Class of Xu at the same university.

Background

Research interests include vision-language learning, multimodal large language models (MLLMs), and embodied agents. During graduate studies, focused on building intelligent perception models that understand visual and linguistic information. Recently, exploring decision-making systems capable of actively interacting with both humans and dynamic environments. The ultimate goal is to develop human-like agents that can perceive real-world environments and make autonomous decisions.

Miscellany