July 2025: Released Video Thinking Test (Video-TT), a holistic benchmark to assess advanced reasoning and understanding correctness/robustness between MLLMs and humans.
October 2024: Updated LLaVA-Video (formerly LLaVA-NeXT-Video), releasing both the model and the data.
August 2024: Released LLaVA-OneVision, an LMM that excels across single-image, multi-image, and video tasks.
July 2024: IJCV Outstanding Reviewer Award 2023.
July 2024: NOAH accepted by TPAMI.
July 2024: Three papers accepted at ECCV 2024.
June 2024: Organized CVPR 2024 workshop: Prompting in Vision.
May 2024: Released LLaVA-NeXT-Video.
September 2023: Visual Prompt Retrieval accepted to NeurIPS 2023.
September 2023: Talk at Alibaba DAMO Academy, hosted by Dr. Lidong Bin.
July 2023: Talk at HITSZ, hosted by Prof. Rui Shao.
June 2023: Introducing Otter.
October 2022: 1st place in Computer Vision in the Wild Challenge.
July 2022: OmniBenchmark accepted to ECCV 2022.
March 2022: Bamboo dataset released.
Research Experience
Involved in multiple research projects such as Video Thinking Test (Video-TT), LLaVA-Video, LLaVA-OneVision, and published papers at several international conferences.
Education
Third-year PhD student at MMLab@NTU, supervised by Prof. Ziwei Liu.
Background
Research Interests: Computer vision and deep learning. Focuses on adapting foundation models from vision to multimodal for real-world use, including benchmarking model performance and adapting models via parameter-efficient tuning, in-context learning, and instruction tuning.
Miscellany
Contact: yuanhan002@e.ntu.edu.sg / Google Scholar / Twitter / GitHub