LLaVA series: including LLaVA-1.5 (SoTA on 11 open-source VLM benchmarks), LLaVA-NeXT, LLaVA-OneVision, LLaVA-Video, LLaVA-Critic, LLaVA-Med (NeurIPS 2023 Datasets and Benchmarks Track Spotlight), LLaVA-Interactive, and LLaVA-Plus.
Developed the proprietary industry-leading VLM Seed-VL-1.5 for image and video understanding.
Published numerous high-impact papers at NeurIPS (Oral/Spotlight), CVPR (Highlights), ECCV, and a survey in Foundations and Trends® in Computer Graphics and Vision.
Notable projects include REACT (CVPR 2023), GLIGEN (CVPR 2023), X-Decoder, K-LITE (NeurIPS 2022 Oral), ELEVATER, and FocalNet.
Authored a 110-page perspective paper 'Multimodal Foundation Models: From Specialists to General-Purpose Assistants' and delivered the CVPR 2023 tutorial on the topic.
Served as Area Chair for NeurIPS, ICML, ICLR, EMNLP, TMLR, and Guest Editor for an IJCV special issue on 'the promises and dangers of large vision models'.