ICLR 2025: "AgentRefine: Enhancing Agent Generalization through Refinement Tuning"
ACL 2024: "DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning"
EMNLP 2024: "How Do Your Code LLMs perform? Empowering Code Instruction Tuning with Really Good Data"
COLING 2024: 2 papers accepted
ACL 2024: 1 paper accepted
ICLR 2023: 1 paper accepted
EMNLP 2023: 4 papers accepted
ACL 2023: 4 papers accepted
EMNLP 2022: 4 papers accepted
COLING 2022: 3 papers accepted
CIKM 2022: 1 paper accepted
NAACL 2022: 2 papers accepted
SIGIR 2022: 1 paper accepted
ACL 2022: 1 paper accepted
arXiv: "SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild"
Research Experience
Mar 2023–Present: Full-time Researcher at Meituan LLM Group, focusing on reasoning models, MoE, and LLM alignment
Jun 2021–Mar 2023: Full-time Researcher at Meituan NLP Group, working on dialogue systems and dialogue pretraining
Jun 2020–Oct 2020: Research Intern at Alibaba DAMO Academy, focusing on recommendation systems
Mar 2020–Jun 2020: Research Intern at Tencent WeChat AI Lab, working on zero-shot learning and slot filling
Oct 2019–Mar 2020: Research Intern at Meituan NLP Group, researching GCN and dialogue systems
Background
Currently working at Meituan LLM Team, with research focus on reasoning models (e.g., o1), Mixture of Experts (MoE), and LLM alignment.
Research interests center on three key areas of Large Language Models (LLMs): complex reasoning, reinforcement learning in real-world settings, and LLM alignment.
In complex reasoning, focuses on the evolution of foundational models and optimization of Long-COT RL, aiming to build new technical pipelines from pre-training to post-training.
In real-world reinforcement learning, explores LLM-driven end-to-end agent systems (e.g., DeepResearch, GUI Agent, Embodied Agent) to push intelligence boundaries through interaction with dynamic environments.
In LLM alignment, works on scalable alignment learning, including data evaluation/optimization and preference learning algorithms, to ensure models are both powerful and aligned with human values.