Jiaming Han
Scholar

Jiaming Han

Google Scholar ID: vgcxKEcAAAAJ
PhD Student, CUHK MMLab
Computer VisionVision-LanguageVisual Generation
Citations & Impact
All-time
Citations
4,943
 
H-index
13
 
i10-index
14
 
Publications
19
 
Co-authors
11
list available
Resume (English only)
Academic Achievements
  • Publications:
  • - Bridge: Growing Visual Generative Capacity for Pre-Trained MLLMs
  • - Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations (NeurlPS 2025)
  • - CrossLMM: Decoupling Long Video Sequences from LLMs via Dual Cross-Attention Mechanisms
  • - Multimodal Long Video Modeling Based on Temporal Dynamic Context
  • - Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation (CoRL 2025)
  • - Retrieval-Augmented Personalization for Multimodal Large Language Models (CVPR 2025)
  • - OneLLM: One Framework to Align All Modalities with Language (CVPR 2024)
  • - ImageBind-LLM: Multi-modality Instruction Tuning
  • - LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
  • - LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention (ICLR 2024)
  • - Few-Shot Object Detection via Variational Feature Aggregation (AAAI 2023)
Research Experience
  • Interned at Bytedance Seed, Shanghai AI Lab, and Tencent YouTu Lab.
Education
  • Received Master and Bachelor degrees from Wuhan University and Central South University, respectively.
Background
  • Currently a PhD student at MMLab, CUHK, advised by Prof. Xiangyu Yue. Recent research focuses on efficient and unified multimodal LLMs, such as LLaMA-Adapter, OneLLM, and Tar. Interned at Bytedance Seed, Shanghai AI Lab, and Tencent YouTu Lab.
Miscellany
  • Personal interests not provided