Scholar

Hongwei Xue

Google Scholar ID: k5CJa5YAAAAJ

University of Science and Technology of China

Multi-ModalVision-Language

Citations & Impact

All-time

Citations

865

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

6 items

2026

Cited

2026

Cited

2026

Cited

2026

Cited

2025

Cited

2025

Cited

Resume (English only)

Academic Achievements

Published several papers, including 'Visual Perception by Large Language Model's Weights' (NeurIPS, 2024), 'Multi-Modal Generative Embedding Model' (Arxiv), 'Stare at What You See: Masked Image Modeling without Reconstruction' (CVPR, 2023), 'CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment' (ICLR, 2023), etc.

Research Experience

Worked as a researcher at ByteDance Research. Prior to that, he had experiences at NUS, Tencent WeChat, Shanghai AI Lab, and Microsoft Research Asia (MSRA). He was a main contributor to WeCLIP, a powerful multi-modal foundation model for various WeChat applications. Also contributed to PixelDance, a video generation model.

Education

Ph.D. from the University of Science and Technology of China (USTC), advised by Jiebo Luo and Houqiang Li; B.S. from the School of the Gifted Young, USTC.

Background

Research interests include Multi-Modal Learning, Computer Vision, and Machine Learning. Much of his research focuses on Vision-and-Language Pre-training.

Miscellany