ECCV 2024: Agent Attention: On the Integration of Softmax and Linear Attention
CVPR 2024: GSVA: Generalized Segmentation via Multimodal Large Language Models
Preprint: DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
ICCV 2023: Adaptive Rotated Convolution for Rotated Object Detection
CVPR 2023: Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
ICLR 2023: Budgeted Training for Vision Transformer
CVPR 2022: Vision Transformer with Deformable Attention (Best Paper Finalists)
CVPR 2021: 3D Object Detection with Pointformer
Preprint: Demystify Mamba in Vision: A Linear Attention Perspective
Honors and Awards: Multiple scholarships from Tsinghua University, including the Friend of Tsinghua – Ubiquant Scholarship, Hefei Talent Scholarship, Samsung Scholarship, etc.
Research Experience
Currently focusing on topics related to dynamic and efficient large multimodal models.
Background
Fourth-year Ph.D. candidate at the Department of Automation, Tsinghua University, advised by Prof. Gao Huang and Prof. Shiji Song. Research mainly focuses on deep learning in computer vision and multimodal learning, specifically in Vision Transformers (2D/3D), dynamic neural architectures, and large multimodal models.