Publications include: 'Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models' (CVPR 2025, Oral, Best Paper Honorable Mention); 'One Diffusion to Generate Them All' (CVPR 2025); 'ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams' (CVPR 2025); 'Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation' (ECCV 2024); 'Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action' (CVPR 2024, Highlight); 'Can Language Models Laugh at YouTube Short-form Videos?' (EMNLP 2023); 'ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning' (ICCV 2021).
Research Experience
Research Scientist on the PRIOR team at Allen Institute for AI, 2024.08 - Present; Postdoctoral Young Investigator on the PRIOR team at Allen Institute for AI, 2023.01 - 2024.07; Research Intern on the PRIOR team at Allen Institute for AI, 2023.01 - Present.
Education
Ph.D. in Computer Science and Engineering from Seoul National University (SNU), 2017.03 - 2023.02, Advisor: Prof. Gunhee Kim; B.S. in Computer Science and Engineering (minor in Statistics) from Seoul National University (SNU), 2010.03 - 2017.02.
Background
Research interests: Computer vision, machine learning, and their applications to real-world problems. Specifically, focuses on multimodal representation learning, especially for high-level video understanding and reasoning.