MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research, CVPR 2025
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature, CVPR 2025
Video Action Differencing, ICLR 2025
Viewpoint Textual Inversion: Discovering Scene Representations and 3D View Control in 2D Diffusion Models, ECCV 2024 (Outstanding Paper Award at the ECCV Workshop)
Orientation-invariant autoencoders learn robust representations for shape profiling of cells and organelles, Nature Communications 2024
Global organelle profiling reveals subcellular localization and remodeling at proteome scale, Cell 2024
Squeezed Diffusion Models, Preprint
The Impact of Image Resolution on Biomedical Multimodal Large Language Models, MLHC 2025
Can Large Language Models Match the Conclusions of Systematic Reviews?, Preprint
Research Experience
Works on vision-language models, agent-based systems, and evaluation. Develops multimodal large language models for biology research.
Education
Stanford University, Computer Vision and Machine Learning, Advisor: Serena Yeung-Levy
Background
Stanford PhD student working on computer vision and machine learning. Advised by Serena Yeung-Levy and supported by the Quad Fellowship.