Scholar

Zhihan Xiong

Google Scholar ID: OsSiEMEAAAAJ

University of Washington

reinforcement learningbanditsactive learning

Citations & Impact

All-time

Citations

103

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

4 items

2026

Cited

2026

Cited

2025

Cited

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

- Publications:
* Hybrid Preference Optimization for Alignment: Faster Convergence Rates by Combining Offline Preferences with Online Exploration
* Language Model Preference Evaluation with Multiple Weak Evaluators
* Policy Mirror Descent with Dual Function Approximation
* LoRe: Personalizing LLMs via Low-Rank Reward Modeling
* A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
* A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning
* Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement
* Learning in Congestion Games with Bandit Feedback
* Near-Optimal Randomized Exploration for Tabular Markov Decision Processes
* Fourier Learning with Cyclical Data
* Selective Sampling for Online Best-arm Identification
* Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
- Conference Papers:
* COLM 2025, AISTATS 2024, CODE@MIT 2023, ICLR 2024, NeurIPS 2022, ICML 2022, NeurIPS 2021, AAAI 2020

Research Experience

- Visiting Researcher at Meta (FAIR Labs), Oct 2022 -- Sep 2024
- Research Intern at Bytedance (AML Group), Jun 2021 -- Sep 2021
- Applied Scientist Intern at Zillow (Personalization Team), Jun 2019 -- Sep 2019

Education

- Ph.D. in Computer Science & Engineering from the Paul G. Allen School of Computer Science & Engineering, University of Washington, 2025, Advisor: Prof. Maryam Fazel
- Master's Degree in Statistics from Stanford University, 2020
- Bachelor's Degree in Mathematics and Engineering Physics from University of Illinois at Urbana-Champaign, 2018, Advisor: Prof. Pierre Moulin

Background

- Research Interests: Theory and application of reinforcement learning and bandit problems
- Current Position: Research Scientist at Meta
- Advisor: Prof. Maryam Fazel
- Collaborators: Prof. Simon S. Du, Prof. Kevin Jamieson, Dr. Lin Xiao

Miscellany

- Reviewer for: ICML (2021, 2022, 2023, 2024), NeurIPS (2021, 2022, 2023) and ICLR (2022, 2023, 2024)
- Teaching Experiences:
* CSE 541: Interactive Learning, Teaching Assistant, Spring 2025, University of Washington
* CSE/EE/ME 578: Convex Optimization, Teaching Assistant, Winter 2025, University of Washington

Co-authors

7 total