- Proposed Verlog: A multi-turn RL framework for LLM agents, built for long-horizon LLM-agentic tasks with highly variable episode lengths.
- Presented a sample-efficient method for online fine-tuning LLM agents using in-context learning to convert sparse feedback into dense signals at NeurIPS 2025 (Oral presentation).
- Introduced DGPO: An on-policy framework for discovering multiple diverse optimal strategies for the same task in a single training process at AAAI 2024.
Research Experience
Looking for a part-time/summer internship.
Education
PhD student at Carnegie Mellon University RI, advised by Prof. Jeff Schneider; Undergraduate in Automation at Tsinghua University, worked with Prof. Jun Zhu.
Background
Research interests focus on LLM agents, deep reinforcement learning, and their applications in decision-making and robotics.