- Temporal Preference Optimization for Long-Form Video Understanding (2025)
- Apollo: An Exploration of Video Understanding in Large Multimodal Models (2024)
- Video-STaR: Bootstrapping Weak Video Supervision for Visual Instruction Tuning (2025)
- Video Action Differencing (2025)
- Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration (2024)
Talks and Project Releases:
- Gave a talk at 'What is Next in Video Understanding' workshop @ CVPR 2024
- Released Temporal Preference Optimization (TPO) framework
- Released Apollo project
- VLM Classifier accepted to NeurIPS 2024
- VideoAgent accepted to ECCV 2024
- VisDiff accepted as an oral presentation at CVPR 2024
- RLCF accepted by ICLR 2024
Research Experience
Collaborated with researchers at Baidu Research and Facebook AI Research during Ph.D. studies. Currently working at Stanford University with Prof. Serena Yeung.
Education
Ph.D.: University of Technology Sydney, advised by Prof. Yi Yang; B.E.: University of Science and Technology of China.
Background
Research interests include Video Understanding, Multimodal Learning, and AI for Healthcare. Currently a Postdoc at Stanford University, affiliated with MARVL and Stanford AI Lab.