"RedStone: Curating General, Code, Math, and QA Data for Large Language Models", arXiv preprint, 2024
"Kosmos-2.5: A Multimodal Literate Model", arXiv preprint, 2023 (Equal Contribution)
"Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models", ICLR Workshop, 2024
"TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering", ECCV 2024 (Oral); Top10 in Hugging Face Space Trending List (Dec 2023)
"TextDiffuser: Diffusion Models as Text Painters", NeurIPS 2023 (Equal Contribution); Top10 in Hugging Face Space Trending List (Jun 2023)
"LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking", ACM Multimedia 2022 (Oral); Over 100 million downloads on Hugging Face (as of Feb 2024)
Research Experience
Jun. 2024 – present: Senior Researcher, General Artificial Intelligence Group, Microsoft Research Asia – Vancouver; Topics: Multimodal AI, General AI, and Large Foundation Models
Jan. 2023 – Jun. 2023: Visiting Student, Language Technology Lab, University of Cambridge; Advisor: Prof. Nigel Collier; Topic: multimodal instruction-following models
Jul. 2021 – Jun. 2024: Research Intern, Natural Language Computing (now GenAI) Group, Microsoft Research Asia – Beijing; Mentors: Dr. Lei Cui and Dr. Furu Wei; Topics: multimodal document foundation models; visual text rendering with diffusion models
Jun. 2019 – Jul. 2021: Research Intern, Multimedia Search and Mining Group, Microsoft Research Asia – Beijing; Mentors: Dr. Bei Liu and Dr. Jianlong Fu; Topics: vision-language pre-training; image-and-text generation
Jul. 2017 – Jul. 2018: Research Intern, Multimedia Search and Mining Group, Microsoft Research Asia – Beijing; Mentors: Dr. Qi Dai and Dr. Tao Mei; Topic: video action detection