Published multiple papers, including: 'Language Repository for Long Video Understanding' (ACL Findings 2025), 'Understanding Long Videos with Multimodal Language Models' (ICLR 2025), 'LLaRA: Large Language and Robotics Assistant' (ICLR 2025), 'Localization in Visual-LLMs Improves Reasoning' (CVPR 2024), 'Language-based Video Self-Supervised Learning' (NeurIPS 2023), 'T2I Diffusion Models are Zero-Shot Segmentors' (CVPR workshop 2023), 'Perceptual Grouping in Contrastive VLMs' (ICCV 2023), 'Self-supervised Video Transformers' (CVPR 2022 oral), 'Adversarial Transferability of Vision Transformers' (ICLR 2022 spotlight), 'Intriguing Properties of Vision Transformers' (NeurIPS 2021 spotlight), 'Orthogonal Projection Loss' (ICCV 2021), 'Conditional Generative Modeling' (ICLR 2021), 'Activity Recognition in Videos' (TCSVT journal).
Research Experience
Research Intern at Salesforce AI Research with Juan Carlos Niebles; Former Intern at Google Research with Srikumar Ramalingam, Meta with Tsung-Yu Lin, and Apple with Jonathon Shlens and Alexander Toshev; Former Researcher at MBZUAI with Salman Khan, Muzammal Naseer, and Fahad Khan.
Education
PhD student at Stony Brook University, NY, USA, advised by Michael Ryoo.
Background
Computer Science Enthusiast, interested in Computer Vision and Machine Learning with a focus on Video Understanding, Vision-Language Representations, and Robot Learning.
Miscellany
Enjoys ballroom dancing, cooking, and theatre during leisure time.