Paper 'Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners' on Arxiv.
Paper 'CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning' accepted by AAMAS 2026.
Paper 'Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning' accepted by AAAI 2025.
Paper 'A two-stage reinforcement learning-based approach for multi-entity task allocation' accepted by Engineering Applications of Artificial Intelligence.
Paper 'Exploration and Anti-Exploration with Distributional Random Network Distillation' accepted by ICML 2024.
Paper 'BATON: Aligning Text-to-Audio Model with Human Preference Feedback' accepted by IJCAI 2024.
Paper 'Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model' accepted by CVPR 2024.
Paper 'Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model' selected as HuggingFace daily paper.
Paper 'Exploration by Random Distribution Distillation' on Arxiv.
Paper 'GTLMA: Generalizable Hierarchical Learning for Tasks with Variable Entities' presented at 2023 International Conference on Frontiers of Robotics and Software Engineering (FRSE).
Paper 'CMBE: Curiosity-driven Model-Based Exploration for Multi-Agent Reinforcement Learning in Sparse Reward Settings' (Oral)
Research Experience
Served as a researcher at the Tencent Hunyuan X team, working on research related to reinforcement learning and large language models.
Education
Graduated from Tsinghua University in 2025 with a degree in Artificial Intelligence, supervised by Prof. Xiu Li, and received extensive research guidance from senior fellow student Jiafei Lyu.
Background
Currently a researcher at the Tencent Hunyuan X team, focusing on RL and LLM. Main research interests include reinforcement learning, especially LLM post-training, exploration mechanisms, and multi-agent reinforcement learning. Proficient and interested in using mathematical theory to optimize LLM and RL methods.