- 'Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D' accepted to EMNLP 2025
- 'ViUniT: Visual Unit Tests for More Robust Visual Programming' accepted to CVPR 2025
- 'Evaluating Vision-Language Models on Bistable Images' received best paper award at CMCL 2024
- 'X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning' accepted to ECCV 2024
- 'ULIP-2' accepted to CVPR 2024
Research Experience
Currently a student researcher at Google (Augmented Reality); Previously a research intern at Salesforce AI.
Education
PhD Student at the University of Pennsylvania; Supervisors: Professor Chris Callison-Burch and Professor Mark Yatskar.
Background
Research Interests: Intersection of Natural Language Processing and Computer Vision. Specialization: Multimodal AI, integrating diverse modalities such as images, audio, video, text, and 3D. Summary: Focused on developing trustworthy models that can see, listen, and comprehend with the nuance of perceptual coherence.
Miscellany
Passionate about education, serving as a Teaching Assistant at the University of Pennsylvania and through community teaching experiences, aiming to challenge students with the beautiful and mentally stimulating concepts of mathematics, logic, and computer science, while breaking down any mental barriers from past negative experiences.