Multiple papers accepted at ICML 2025, including 'BRiTE' (equal contribution) and 'Reward-Augmented Data Enhances Direct Preference Alignment of LLMs' (equal contribution)
Published 'Provably Mitigating Overoptimization in RLHF' at NeurIPS 2024
Published 'Self-Exploring Language Models' in Transactions on Machine Learning Research (TMLR), awarded Best Paper at ICML AutoRL Workshop 2024
Published 'Reason for Future, Act for Now' (equal contribution) and 'Adaptive-Gradient Policy Optimization' at ICML 2024
Published 'Model-Based Reparameterization Policy Gradient Methods' at NeurIPS 2023 (co-advised by Prof. Zhaoran Wang and Prof. Tuo Zhao)
Published 'Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics' at ICML 2023
Multiple 2025 preprints on LLM reasoning and RL, e.g., 'Learning to Reason as Action Abstractions' and 'Beyond Markovian'
Work featured by MIT Tech Review China and HF Daily Papers