Published "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena" at NeurIPS 2023
Published two papers at ICLR 2024: "LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset" and "Llm-assisted code cleaning for training accurate code generators"
Released technical report on Chatbot Arena (arXiv preprint)
Proposed automated LLM evaluation benchmarks such as MT-Bench and Arena-Hard
Vicuna has been downloaded over 8 million times with 1000+ citations
FastChat has garnered over 30K GitHub stars and 200+ contributors
Hosted a Kaggle competition for human preference prediction (May 2024)
Cluster-GCN is widely integrated into platforms like DGL and PyTorch Geometric
Research Experience
Conducting AI evaluation research at SkyLab
Leading or contributing to multiple AI evaluation and LLM-related projects, including Chatbot Arena, Multimodal Arena, RedTeam Arena, WebDev Arena, Arena Hard, and BenchBuilder
Developed FastChat, a multi-model serving framework powering Chatbot Arena
Contributed to Vicuna, a high-quality LLM chatbot
Worked on SkyPilot, an intercloud system for AI and batch jobs
Developed Cluster-GCN for scalable training of large graph neural networks