Scholar

Zhuofan Xia

Google Scholar ID: m2M6b58AAAAJ

PhD candidate, Tsinghua University

Efficient Deep LearningComputer VisionMultimodal Learning

Citations & Impact

All-time

Citations

2,454

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

4 items

2026

Cited

2026

Cited

2025

Cited

2025

Cited

Resume (English only)

Academic Achievements

ECCV 2024: Agent Attention: On the Integration of Softmax and Linear Attention
CVPR 2024: GSVA: Generalized Segmentation via Multimodal Large Language Models
Preprint: DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
ICCV 2023: Adaptive Rotated Convolution for Rotated Object Detection
CVPR 2023: Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
ICLR 2023: Budgeted Training for Vision Transformer
CVPR 2022: Vision Transformer with Deformable Attention (Best Paper Finalists)
CVPR 2021: 3D Object Detection with Pointformer
Preprint: Demystify Mamba in Vision: A Linear Attention Perspective
Honors and Awards: Multiple scholarships from Tsinghua University, including the Friend of Tsinghua – Ubiquant Scholarship, Hefei Talent Scholarship, Samsung Scholarship, etc.

Research Experience

Currently focusing on topics related to dynamic and efficient large multimodal models.

Background

Fourth-year Ph.D. candidate at the Department of Automation, Tsinghua University, advised by Prof. Gao Huang and Prof. Shiji Song. Research mainly focuses on deep learning in computer vision and multimodal learning, specifically in Vision Transformers (2D/3D), dynamic neural architectures, and large multimodal models.

Miscellany