Recipient of the 2024 Apple AI/ML PhD fellowship (data-centric AI track); OpenAI Superalignment Fellowship; selected as one of the 2025 ML and Systems Rising Stars. Published papers: 'Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base' (COLM 2025), 'Task Me Anything' (NeurIPS 2024), 'One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory' (ICCV 2025), 'ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models' (Blog at Salesforce Research, VentureBeat, MarkTechPost), 'H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos' (SynData4CV @ CVPR 2025), 'Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias' (NeurIPS 2023), 'SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality' (NeurIPS 2023), 'DataComp: In Search of the Next Generation of Multimodal Datasets' (NeurIPS 2023), 'On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training' (NeurIPS 2023).
Research Experience
Co-instructor for CSE 455: Computer Vision (undergrad); TA for CS 344: Introduction to Data Management (undergrad); TA for CS 599C: Training Data Management & Weak Supervision (grad); mentor for CSE 492 R: CSE Group Research (undergrad); mentor for UW CSE Pre-Application Mentorship Service (PAMS) program.
Education
Currently a 6th year Ph.D. student in Computer Science at the University of Washington, Seattle, advised by Prof. Ranjay Krishna and Prof. Alex Ratner; undergraduate degree in Computer Science from the University of Illinois Urbana-Champaign, advised by Prof. Jiawei Han.
Background
Research interests include data-centric AI with applications across multimodal models, computer vision, natural language processing, and science. Current research directions include dynamic evaluation, data utilization, and data synthesis/collection.
Miscellany
Passionate about mentoring and teaching, especially encouraging students from underrepresented groups to reach out.