Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Published 'ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models', proposing a five-stage ConvNeXt visual encoder to improve high-resolution understanding and efficiency
Published 'Domain Adaptation via Prompt Learning' (DAPrompt), introducing domain-specific prompts to mitigate information loss in domain alignment
Published 'On the Integration of Self-Attention and Convolution' (ACMix), presenting a unified operator with shared computation
Serving as reviewer for top conferences (CVPR, NeurIPS, ICCV, ICML, ECCV) and journals (TIP, TCSVT)
Released ConvLLaVA project and paper in May 2024
Launched a GitHub repository collecting foundation model papers in June 2023, open to collaboration
Background
Fifth-year Ph.D. candidate at Department of Automation, Tsinghua University
Research interests: Computer Vision and Multimodal Foundation Models
Aims to enable machine learning models to understand and interact with the open world
Believes foundation models should be grounded in the physical world, with vision as essential for real-world understanding
Actively seeking postdoctoral and industrial opportunities