1. POLAR: Pre-Trained Policy Discriminators are General Reward Models (NeurIPS 2025)
2. Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference Modeling (ACL 2025)
3. Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models (NAACL 2024)
4. Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning (ICML 2024)
Research Experience
1. 2025.1 - 2025.3: Large Language Model Center Group of Shanghai Artificial Intelligence Laboratory, Shanghai, China
2. 2024.8 - 2024.12: Bytedance, AI Lab Research, Shanghai, China
3. 2023.12 - 2024.3: General Safety Group of Shanghai Artificial Intelligence Laboratory, Shanghai, China
Education
1. Master's Degree: School of Computer Science, Fudan University, Advisors: Prof. Xuanjing Huang and Assoc. Prof. Tao Gui, Time: Fall 2024 - Present
2. Bachelor's Degree: Fudan University, Advisor: Assoc. Prof. Tao Gui
Background
Currently a Master's degree student at the School of Computer Science of Fudan University (from fall, 2024) and a member of the FudanNLP Lab, co-advised by Prof. Xuanjing Huang and Associate Prof. Tao Gui. Previously, obtained a bachelor’s degree from Fudan University, advised by Associate Prof. Tao Gui.