- LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess
- VMDT: Decoding the Trustworthiness of Video Foundation Models
- rLLM: A Framework for Post-Training Language Agents
- JudgeBench: A Benchmark for Evaluating LLM-Based Judges
- Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
- Agent Instructs Large Language Models to be General Zero-Shot Reasoners
Research Experience
Project lead for rLLM; involved in multiple research projects such as LLM post-training, agentic AI, etc.
Education
Currently a second-year PhD student in Computer Science at UC Santa Cruz, advised by Chenguang Wang. Previously, completed Bachelor’s degree in Computer Science and Mathematics, as well as a Master’s degree in Computer Science, both at Washington University in St. Louis.
Background
Research interests include LLM post-training, agentic AI, and scaling test-time compute for hard-to-verify tasks. Also a project lead for rLLM.