Xiujun Li
Scholar

Xiujun Li

Google Scholar ID: SW_WaQ0AAAAJ
University of Washington / Apple
Reinforcement LearningArtificial IntelligenceNLPMLLMDialog
Citations & Impact
All-time
Citations
9,696
 
H-index
37
 
i10-index
59
 
Publications
20
 
Co-authors
48
list available
Resume (English only)
Academic Achievements
  • Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels? (arXiv 2024)
  • Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms (ICLR 2025)
  • Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset (ICLR 2025)
  • Multimodal Autoregressive Pre-training of Large Vision Encoders (CVPR 2025)
  • VinVL: Making Visual Representations Matter in Vision-Language Models (CVPR 2021)
  • Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks (ECCV 2020)
  • Robust navigation with language pretraining and stochastic sampling (EMNLP 2019)
  • End-to-End Task-Completion Neural Dialogue Systems (IJCNLP 2017)
  • Composite Task-Completion Dialogue System via Hierarchical Deep Reinforcement Learning (EMNLP 2017)
  • Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning (ACL 2018)
Research Experience
  • Worked five years at Microsoft Research before joining Apple. Research experience spans dialog, deep reinforcement learning, NLP, vision and language, and multimodal LLMs.
Education
  • PhD from UW CSE in 2024, advised by Yejin Choi.
Background
  • Research interests include multimodal LLMs, LLMs, NLP, vision and language. Currently a Research Scientist at Apple. Also interested in video generation.