Scholar

Haodong Duan

Google Scholar ID: vi3W-m8AAAAJ

Shanghai AI Lab | CUHK | PKU

Computer VisionVideo UnderstandingMultimodal LearningGenerative AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

7,554

H-index

i10-index

Publications

Co-authors

list available

Contact

Emaildhd.efz@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

46 items

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

2026

Cited

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

2026

Cited

Can Retrieval Heads See Images? Multimodal Retrieval Heads in Long-Context Vision-Language Models

2026

Cited

OpenCompass: A Universal Evaluation Platform for Large Language Models

2026

Cited

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

2026

Cited

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

2026

Cited

OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces

2026

Cited

Forge: Quality-Aware Reinforcement Learning for NP-Hard Optimization in LLMs

2026

Cited

Resume (English only)

Academic Achievements

Three papers accepted by NeurIPS 2024 main conference: InternLM-XComposer2-4KHD, MMStar, Prism (September 2024).
Three papers accepted by NeurIPS 2024 Dataset & Benchmark Track: ShareGPT4Video, GMAI-MMBench, MMBench-Video (September 2024).
MMBench accepted by ECCV 2024 as an oral presentation (August 2024).
MathBench accepted by ACL 2024 (May 2024).
Two papers (BotChat, Ada-LEval) accepted by NAACL 2024 (March 2024).
Released VLMEvalKit, an all-in-one toolkit for evaluating LVLMs, and it was accepted by MM 2024 (December 2023).
SkeleTR accepted by ICCV 2023 (October 2023).
Released PYSKL, a codebase for skeleton action recognition, and it was accepted by MM 2022 (May 2022).
Three papers accepted by CVPR 2022, with PoseC3D and TransRank as oral presentations and OCSampler as a poster (March 2022).
OmniSource accepted by ECCV 2020 (July 2020).
TRB accepted by ICCV 2019 as an oral presentation (July 2019).

Research Experience

Joined Shanghai AI Lab as a postdoctoral researcher in October 2023.
Interned at AWS AI from July 2022, advised by Dr. Mingze Xu.
Joined OpenMMLab in August 2020 and served as a maintainer of MMAction2.
Served as a reviewer for multiple international conferences such as ICCV, AAAI, CVPR, ECCV, NeurIPS, etc.
Acted as a reviewer for several journals including TPAMI, IJCV, TIP, etc.

Background

His research interests include video recognition, human-centric action understanding, and multi-modality learning. He is currently a postdoctoral researcher at Shanghai AI Lab, focusing on the evaluation of large language models and multi-modality models.

Miscellany