Zhihan Xiong
Scholar

Zhihan Xiong

Google Scholar ID: OsSiEMEAAAAJ
University of Washington
reinforcement learningbanditsactive learning
Citations & Impact
All-time
Citations
103
 
H-index
5
 
i10-index
2
 
Publications
14
 
Co-authors
7
list available
Resume (English only)
Academic Achievements
  • - Publications:
  • * Hybrid Preference Optimization for Alignment: Faster Convergence Rates by Combining Offline Preferences with Online Exploration
  • * Language Model Preference Evaluation with Multiple Weak Evaluators
  • * Policy Mirror Descent with Dual Function Approximation
  • * LoRe: Personalizing LLMs via Low-Rank Reward Modeling
  • * A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
  • * A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning
  • * Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement
  • * Learning in Congestion Games with Bandit Feedback
  • * Near-Optimal Randomized Exploration for Tabular Markov Decision Processes
  • * Fourier Learning with Cyclical Data
  • * Selective Sampling for Online Best-arm Identification
  • * Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
  • - Conference Papers:
  • * COLM 2025, AISTATS 2024, CODE@MIT 2023, ICLR 2024, NeurIPS 2022, ICML 2022, NeurIPS 2021, AAAI 2020
Research Experience
  • - Visiting Researcher at Meta (FAIR Labs), Oct 2022 -- Sep 2024
  • - Research Intern at Bytedance (AML Group), Jun 2021 -- Sep 2021
  • - Applied Scientist Intern at Zillow (Personalization Team), Jun 2019 -- Sep 2019
Education
  • - Ph.D. in Computer Science & Engineering from the Paul G. Allen School of Computer Science & Engineering, University of Washington, 2025, Advisor: Prof. Maryam Fazel
  • - Master's Degree in Statistics from Stanford University, 2020
  • - Bachelor's Degree in Mathematics and Engineering Physics from University of Illinois at Urbana-Champaign, 2018, Advisor: Prof. Pierre Moulin
Background
  • - Research Interests: Theory and application of reinforcement learning and bandit problems
  • - Current Position: Research Scientist at Meta
  • - Advisor: Prof. Maryam Fazel
  • - Collaborators: Prof. Simon S. Du, Prof. Kevin Jamieson, Dr. Lin Xiao
Miscellany
  • - Reviewer for: ICML (2021, 2022, 2023, 2024), NeurIPS (2021, 2022, 2023) and ICLR (2022, 2023, 2024)
  • - Teaching Experiences:
  • * CSE 541: Interactive Learning, Teaching Assistant, Spring 2025, University of Washington
  • * CSE/EE/ME 578: Convex Optimization, Teaching Assistant, Winter 2025, University of Washington