Scholar

Xiujun Li

Google Scholar ID: SW_WaQ0AAAAJ

University of Washington / Apple

Reinforcement LearningArtificial IntelligenceNLPMLLMDialog

Citations & Impact

All-time

Citations

9,696

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

2 items

arXiv.org · 2024

Cited

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels? (arXiv 2024)
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms (ICLR 2025)
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset (ICLR 2025)
Multimodal Autoregressive Pre-training of Large Vision Encoders (CVPR 2025)
VinVL: Making Visual Representations Matter in Vision-Language Models (CVPR 2021)
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks (ECCV 2020)
Robust navigation with language pretraining and stochastic sampling (EMNLP 2019)
End-to-End Task-Completion Neural Dialogue Systems (IJCNLP 2017)
Composite Task-Completion Dialogue System via Hierarchical Deep Reinforcement Learning (EMNLP 2017)
Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning (ACL 2018)

Research Experience

Worked five years at Microsoft Research before joining Apple. Research experience spans dialog, deep reinforcement learning, NLP, vision and language, and multimodal LLMs.

Education

Background

Research interests include multimodal LLMs, LLMs, NLP, vision and language. Currently a Research Scientist at Apple. Also interested in video generation.

Co-authors

48 total