Yushi Hu
Scholar

Yushi Hu

Google Scholar ID: mXN51X0AAAAJ
University of Washington
Natural Language ProcessingComputer Vision
Citations & Impact
All-time
Citations
2,640
 
H-index
17
 
i10-index
17
 
Publications
20
 
Co-authors
15
list available
Resume (English only)
Academic Achievements
  • Publications include: 'Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models' (NeurIPS 2024), 'BLINK: Multimodal Large Language Models Can See but Not Percieve' (ECCV 2024), 'Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models' (CVPR 2024, Oral), 'Fine-Grained Human Feedback Gives Better Rewards for Language Model Training' (NeurIPS 2023, Spotlight), 'TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering' (ICCV 2023), 'PromptCap: Prompt-Guided Task-Aware Image Captioning' (ICCV 2023), 'Decoding-Time Language Model Alignment with Multiple Objectives' (NeurIPS 2024), 'Training Language Models to Generate Text with Citations via Fine-grained Rewards' (ACL 2024).
Research Experience
  • Internships at Meta GenAI, Allen Institute for AI (AI2), and Google Research; closely collaborates with Prof. Ranjay Krishna.
Education
  • PhD: University of Washington (UW), Advisors: Mari Ostendorf, Noah A. Smith; B.S.: University of Chicago, Majors: Mathematics, Computer Science, and Economics, Graduation Year: 2021, Advisor: Karen Livescu, Research at Toyota Technological Institute at Chicago (TTIC).
Background
  • Research interests: Building multimodal models that can understand, reason, and generate across many modalities (text, image, video, etc.); interested in building powerful multimodal agents with these models.