Mickel Liu
Scholar

Mickel Liu

Google Scholar ID: 2oog2ZcAAAAJ
University of Washington
Reinforcement LearningMulti-Agent LearningNatural Language Processing
Citations & Impact
All-time
Citations
2,080
 
H-index
6
 
i10-index
6
 
Publications
8
 
Co-authors
7
list available
Resume (English only)
Academic Achievements
  • Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models (Preprint, equal contribution): Used self-play RL and hidden Chain-of-Thought to discover diverse adversarial attacks for safer LLM alignment
  • BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset (NeurIPS 2023, equal contribution): Introduced a human-preference dataset showing that decoupling helpfulness and harmlessness improves safety without performance loss
  • Safe RLHF: Safe Reinforcement Learning from Human Feedback (ICLR 2024 Spotlight): Proposed a constrained RLHF algorithm using Lagrangian methods to balance harmlessness and helpfulness, outperforming existing alignment methods
  • Baichuan 2: Open large-scale language models (Technical Report, author): Contributed to open-sourcing Baichuan2 models achieving state-of-the-art results among open-source models on benchmarks like MMLU, CMMLU, GSM8K, HumanEval, and SuperCLUE-agent
  • Proactive Multi-Camera Collaboration For 3D Human Pose Estimation (ICLR 2023, equal contribution): Developed a multi-agent RL framework for collaborative 3D pose estimation in dynamic crowds using Shapley-value-inspired rewards