- Distributionally Robust Optimization For Language Modeling
- Optimizing Language Models for Human Preferences is a Causal Inference Problem
- Token-level Direct Preference Optimization
- SimPO: Simple Preference Optimization with a Reference-Free Reward
- KL Divergence: Forward vs Reverse?
Research Experience
Internship at MSRA (Microsoft Research Asia).
Education
Master's Degree student at Zhejiang University, advised by A.P. Kun Kuang and Prof. Fei Wu.
Background
Research interests include Model Compression (Data-Free Knowledge Distillation, Out-of-Domain Knowledge Distillation, etc.), Domain Adaptation, and Large-Small Model Collaboration. Currently, committed to LLM (Large Language Model), especially in reinforcement learning.
Miscellany
The personal website uses the Chirpy theme for Jekyll.