Scholar

Mickel Liu

Google Scholar ID: 2oog2ZcAAAAJ

University of Washington

Reinforcement LearningMulti-Agent LearningNatural Language Processing

Citations & Impact

All-time

Citations

2,080

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

6 items

2025

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

arXiv.org · 2023

Cited

658

Resume (English only)

Academic Achievements

Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models (Preprint, equal contribution): Used self-play RL and hidden Chain-of-Thought to discover diverse adversarial attacks for safer LLM alignment
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset (NeurIPS 2023, equal contribution): Introduced a human-preference dataset showing that decoupling helpfulness and harmlessness improves safety without performance loss
Safe RLHF: Safe Reinforcement Learning from Human Feedback (ICLR 2024 Spotlight): Proposed a constrained RLHF algorithm using Lagrangian methods to balance harmlessness and helpfulness, outperforming existing alignment methods
Baichuan 2: Open large-scale language models (Technical Report, author): Contributed to open-sourcing Baichuan2 models achieving state-of-the-art results among open-source models on benchmarks like MMLU, CMMLU, GSM8K, HumanEval, and SuperCLUE-agent
Proactive Multi-Camera Collaboration For 3D Human Pose Estimation (ICLR 2023, equal contribution): Developed a multi-agent RL framework for collaborative 3D pose estimation in dynamic crowds using Shapley-value-inspired rewards

Co-authors

7 total