- Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
- Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
- SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
- SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
- LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
- A Simple Romance Between Multi-Exit Vision Transformer and Token Reduction
- Function-Consistent Feature Distillation
Research Experience
- Currently a Ph.D. student at MMLab, involved in multiple research projects including Lumina-Video, Lumina-mGPT, Lumina-Next, etc.
- During the time at VIPL lab, participated in various research works.
Education
- 2024.09 - Present: MMLab, The Chinese University of Hong Kong (Ph.D.), Supervisor: Prof. Hongsheng Li
- 2021.09 - 2024.06: VIPL Lab, Institute of Computing Technology, Chinese Academy of Sciences (Master), Supervisors: Prof. Shiguang Shan and Prof. Meina Kan
- 2017.09 - 2021.06: School of Software Engineering, Tongji University (Bachelor)
Background
- Research Interests: Multimodal understanding and generation
- Background: Currently a first-year Ph.D. student at MMLab, CUHK, supervised by Prof. Hongsheng Li. Before that, obtained a master’s degree from VIPL, supervised by Prof. Shiguang Shan and Prof. Meina Kan.