Xiaohan Wang
Scholar

Xiaohan Wang

Google Scholar ID: iGA10XoAAAAJ
Stanford University
Computer VisionVideo UnderstandingLarge Multimodal Models
Citations & Impact
All-time
Citations
2,569
 
H-index
28
 
i10-index
41
 
Publications
20
 
Co-authors
0
 
Resume (English only)
Academic Achievements
  • Publications:
  • - Temporal Preference Optimization for Long-Form Video Understanding (2025)
  • - Apollo: An Exploration of Video Understanding in Large Multimodal Models (2024)
  • - Video-STaR: Bootstrapping Weak Video Supervision for Visual Instruction Tuning (2025)
  • - Video Action Differencing (2025)
  • - Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration (2024)
  • Talks and Project Releases:
  • - Gave a talk at 'What is Next in Video Understanding' workshop @ CVPR 2024
  • - Released Temporal Preference Optimization (TPO) framework
  • - Released Apollo project
  • - VLM Classifier accepted to NeurIPS 2024
  • - VideoAgent accepted to ECCV 2024
  • - VisDiff accepted as an oral presentation at CVPR 2024
  • - RLCF accepted by ICLR 2024
Research Experience
  • Collaborated with researchers at Baidu Research and Facebook AI Research during Ph.D. studies. Currently working at Stanford University with Prof. Serena Yeung.
Education
  • Ph.D.: University of Technology Sydney, advised by Prof. Yi Yang; B.E.: University of Science and Technology of China.
Background
  • Research interests include Video Understanding, Multimodal Learning, and AI for Healthcare. Currently a Postdoc at Stanford University, affiliated with MARVL and Stanford AI Lab.
Co-authors
0 total
Co-authors: 0 (list not available)