Publications include: 'Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models' (NeurIPS 2024), 'BLINK: Multimodal Large Language Models Can See but Not Percieve' (ECCV 2024), 'Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models' (CVPR 2024, Oral), 'Fine-Grained Human Feedback Gives Better Rewards for Language Model Training' (NeurIPS 2023, Spotlight), 'TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering' (ICCV 2023), 'PromptCap: Prompt-Guided Task-Aware Image Captioning' (ICCV 2023), 'Decoding-Time Language Model Alignment with Multiple Objectives' (NeurIPS 2024), 'Training Language Models to Generate Text with Citations via Fine-grained Rewards' (ACL 2024).
Research Experience
Internships at Meta GenAI, Allen Institute for AI (AI2), and Google Research; closely collaborates with Prof. Ranjay Krishna.
Education
PhD: University of Washington (UW), Advisors: Mari Ostendorf, Noah A. Smith; B.S.: University of Chicago, Majors: Mathematics, Computer Science, and Economics, Graduation Year: 2021, Advisor: Karen Livescu, Research at Toyota Technological Institute at Chicago (TTIC).
Background
Research interests: Building multimodal models that can understand, reason, and generate across many modalities (text, image, video, etc.); interested in building powerful multimodal agents with these models.