Yuetai Li
Scholar

Yuetai Li

Google Scholar ID: 6Sof_XAAAAAJ
University of Washington
LLM AgentLLM ReasoningPost-trainingTrustworthy AI
Citations & Impact
All-time
Citations
227
 
H-index
7
 
i10-index
6
 
Publications
14
 
Co-authors
10
list available
Resume (English only)
Academic Achievements
  • Proposed 'Small Model Learnability Gap': small models perform better on shorter, simpler reasoning chains rather than long CoT or distillation from large teachers
  • Discovered that RL-trained math models generalize well to non-reasoning domains (e.g., alignment), while SFT models lose this capacity; identified sampling policy as key to generalization
  • Identified 'Temporal Forgetting': 76.7% of AIME problems were correctly solved at intermediate checkpoints during RL training of Deepseek-R1-1.5B, but only 30% remained correct in the final model; proposed 'Temporal Sampling' to leverage training dynamics for answer diversity
  • Introduced SafeChain dataset to improve safety alignment without compromising reasoning; showed that long CoT does not necessarily enhance safety
  • TinyV: proposed a lightweight LLM-based verifier to address >38% false negatives in answer verification during RL training, improving reward estimation accuracy
  • Visual Sphinx: developed a four-stage pipeline generating 660K visual logic data for RL training of multimodal reasoning models
  • ICLR BiAlign Workshop (Oral), awarded 'Best Honorable Mention'