Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Selected Publications:
- Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models
- VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
- VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
- MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
- LTD-Bench: Evaluating Large Language Models by Letting Them Draw
- Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
- Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
- Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment
- FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression
- Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models
- PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications
Research Experience
Principal Researcher and Team Manager at Tencent Youtu Lab.
Education
Master's degree: 2018 from Xiamen University, supervised by Prof. Rongrong Ji; Bachelor's degree: 2015 from Zhengzhou University, advised by Prof. Mingliang Xu.
Background
Research interests: deep learning and its application in computer vision and natural language processing. Previously a Principal Researcher and Team Manager at Tencent Youtu Lab.
Miscellany
Reviewer for ICML, ICLR, NeurIPS, CVPR, ICCV, ECCV, AAAI and TPAMI.