DataComp-LM: In search of the next generation of training sets for language models, NeurIPS 2024
DataComp: In search of the next generation of multimodal datasets, NeurIPS 2023
LAION-5B: An open large-scale dataset for training next generation image-text models, NeurIPS 2022, Outstanding paper award
Robust fine-tuning of zero-shot models, CVPR 2022, Best paper finalist
Retiring Adult: New Datasets for Fair Machine Learning, NeurIPS 2021 & EAAMO 2021, New Horizons Award
Measuring Robustness to Natural Distribution Shifts in Image Classification, NeurIPS 2020
Do ImageNet Classifiers Generalize to ImageNet?, ICML 2019
Towards Deep Learning Models Resistant to Adversarial Attacks, ICLR 2018
Practical and Optimal LSH for Angular Distance, NIPS 2015
Research Experience
Assistant professor at Stanford in the Computer Science Department and Stanford Data Science; member of the technical staff at Anthropic and LAION.
Education
PhD thesis 'Algorithms Above the Noise Floor', 2018, MIT, George M. Sprowls Award (for best PhD theses in computer science at MIT).
Background
Assistant professor at Stanford in the Computer Science Department and Stanford Data Science. Research interests revolve around the foundations of machine learning, often with a focus on datasets, multimodality, reliable generalization, and language models.