Published multiple papers, including 'Towards Personalized Language Models via Inference-time Human Preference Optimization' (NeurIPS 2024 AFM), 'Diff-BBO: Diffusion-Based Inverse Modeling for Black-Box Optimization' (NeurIPS 2024 BDU), 'Log-concave Sampling from a Convex Body with a Barrier: a Robust and Unified Dikin Walk' (NeurIPS 2024), and more. Invited to give talks at various academic conferences.
Research Experience
Interned at IBM Research, Amazon, and Honda Research Institute, working on LLM for personalization, RL for ranking and recommendation systems, and robotics. Experienced in fine-tuning LLMs and reward models, designing CoT prompting and reasoning frameworks, LLM decoding, and training R1-style reasoning LLMs using RL (e.g., PPO, GRPO).
Education
PhD candidate in Computer Science at UC San Diego (expected 2025), advised by Prof. Yian Ma; MSc in Computer Science from UC San Diego (2020).
Background
Primary research interests span reinforcement learning (RL), foundation models, and Bayesian inference, with a focus on addressing fundamental challenges in sequential decision making under uncertainty. Recently, particularly interested in LLM alignment and reasoning, exploring how RL plays a role in these topics. The goal is to design provably efficient and practical algorithms with performance guarantees, achieving both statistical and computational benefits.
Miscellany
Received awards such as the NSF AIVO Travel Grant. Served as a reviewer for several international conferences (e.g., NeurIPS, AISTATS, AAAI, ICML, ICLR, ISIT) and journals (e.g., IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Information Theory).