Published multiple papers in top international conferences such as ICCV, CVPR, ICLR; including 'Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities', 'Toward Versatile and Efficient Multimodal Models' (PhD Thesis), 'LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models', etc.
Research Experience
Works as a Research Scientist at Google DeepMind, involved in the Gemini Multimodal project.
Education
Received a Ph.D. in Computer Sciences from the University of Wisconsin-Madison, advised by Prof. Yong Jae Lee.
Background
Research interests include multimodal models, vision-language models, etc.; currently a Research Scientist at Google DeepMind, working on the Gemini Multimodal project.
Miscellany
Has recent talk videos available on criticizing and creating vision-language models; contact information includes email, GitHub, Google Scholar, LinkedIn, Twitter (X), and blog.