- (EMNLP 2025 Findings, Co-First author) DLPO: Towards a Robust, Efficient, and Generalizable Pr
Research Experience
Internships:
- iFLYTEK (Hefei), Research Intern, September 2025 – Present
- Du Xiaoman Financial (Beijing), Research Intern, January 2025 – February 2025
- Westlake University (Hangzhou), Research Intern, December 2023 – September 2024
Education
First-year Master's student at HIT (Harbin Institute of Technology) and a member of the SCIR LA. Supervised by Professor Wanxiang Che, Professor Libo Qin, and Ph.D. candidate Qiguang Chen.
Background
Current research interests focus on RL4LLM and LLM reasoning. Has research experience in Safe RL and Offline RL.