Published multiple papers including 'Self-playing Adversarial Language Game Enhances LLM Reasoning' and 'Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Games'. Served as an Area Chair for ARR 2025. Multiple papers accepted at NAACL 2025, NeurIPS 2024, EMNLP 2024, ACL 2024, and other international conferences.
Research Experience
Currently a researcher at Alibaba Group, leading the Quark Foundation LLM RL Team. Previously worked with the RL & Agent Team at Moonshot AI (Kimi) and the Hunyuan LLM Team at Tencent AI Lab. Conducted research on Bayesian and probabilistic machine learning during graduate school.
Education
Received Ph.D. from the Department of Electric and Computer Engineering at Duke University in 2021, advised by Dr. Lawrence Carin. Graduated with B.S. from the Department of Mathematical Sciences at Tsinghua University in 2017.
Background
Researcher at Alibaba Group, leading the Quark Foundation LLM RL Team. Focuses on enhancing LLMs’ capacity via RLHF, RLVR, and agentic RL. Previously a member of the RL & Agent Team at Moonshot AI (Kimi) and the Hunyuan LLM Team at Tencent AI Lab. Research interests include LLM Self-play, Alignment (RLHF), Text Generation, and NLP Fairness.