Scholar

Yuanhan Zhang

Google Scholar ID: g6grFy0AAAAJ

PhD Candidate, MMLab@NTU

Computer VisionMachine Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

8,103

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailyuanhan002@e.ntu.edu.sg TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

12 items

From Pixels to Words -- Towards Native One-Vision Models at Scale

2026

Cited

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

2026

Cited

VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding

2026

Cited

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

2026

Cited

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

2025

Cited

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

2025

Cited

EgoLife: Towards Egocentric Life Assistant

2025

Cited

Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos

2025

Cited

Resume (English only)

Academic Achievements

July 2025: Released Video Thinking Test (Video-TT), a holistic benchmark to assess advanced reasoning and understanding correctness/robustness between MLLMs and humans.
October 2024: Updated LLaVA-Video (formerly LLaVA-NeXT-Video), releasing both the model and the data.
August 2024: Released LLaVA-OneVision, an LMM that excels across single-image, multi-image, and video tasks.
July 2024: IJCV Outstanding Reviewer Award 2023.
July 2024: NOAH accepted by TPAMI.
July 2024: Three papers accepted at ECCV 2024.
June 2024: Organized CVPR 2024 workshop: Prompting in Vision.
May 2024: Released LLaVA-NeXT-Video.
September 2023: Visual Prompt Retrieval accepted to NeurIPS 2023.
September 2023: Talk at Alibaba DAMO Academy, hosted by Dr. Lidong Bin.
July 2023: Talk at HITSZ, hosted by Prof. Rui Shao.
June 2023: Introducing Otter.
October 2022: 1st place in Computer Vision in the Wild Challenge.
July 2022: OmniBenchmark accepted to ECCV 2022.
March 2022: Bamboo dataset released.

Research Experience

Involved in multiple research projects such as Video Thinking Test (Video-TT), LLaVA-Video, LLaVA-OneVision, and published papers at several international conferences.

Education

Third-year PhD student at MMLab@NTU, supervised by Prof. Ziwei Liu.

Background

Research Interests: Computer vision and deep learning. Focuses on adapting foundation models from vision to multimodal for real-world use, including benchmarking model performance and adapting models via parameter-efficient tuning, in-context learning, and instruction tuning.

Miscellany