Weitai Kang
Scholar

Weitai Kang

Google Scholar ID: hDl0MkwAAAAJ
University of Illinois Chicago
Large Multimodal ModelVisual GroundingAI Agent
Citations & Impact
All-time
Citations
106
 
H-index
8
 
i10-index
5
 
Publications
13
 
Co-authors
21
list available
Resume (English only)
Academic Achievements
  • - Publications:
  • * InfantAgent-Next (NeurIPS 2025)
  • * 3DResT (IEEE Transactions on Multimedia, 2025)
  • * Robin3D (ICCV 2025)
  • * ExpVG (arXiv, 2025)
  • * GuirlVG (arXiv, 2025)
  • * AttBalance (ACMMM 2025)
  • * Intent3D (ICLR 2025)
  • * SaCo (CVPR 2024)
  • * TokenTM (CVPR 2024)
  • * ACTRESS (arXiv, 2023)
  • * SegVG (ECCV 2024)
Research Experience
  • - Ph.D. research: Working with Prof. Yan Yan in Computer Science at the University of Illinois Chicago
  • - Internships: Adobe, SonyAI, Tencent, SenseTime
  • - Visiting Scholar: University of Central Florida, working with Prof. Mubarak Shah
  • - Teaching Assistant: CS 577: Deep Learning at Illinois Institute of Technology
Education
  • - Degree: Ph.D. candidate
  • - University: University of Illinois Chicago
  • - Advisor: Prof. Yan Yan
  • - Expected Graduation: 2027
  • - Major: Computer Science
  • - Bachelor's Degree: Mathematics from Sun Yat-sen University
  • - Graduation Year: 2022
  • - Honors: Outstanding Student Scholarship each year
Background
  • - Research Interests: Multimodal fine-grained understanding across image, GUI, 3D, and video domains. Focuses on building multimodal large language models (e.g., Robin3D) with optimal paradigm design (e.g., ExpVG) and training strategies (e.g., GuirlVG). Explores how to scale higher-quality data, propose stronger supervision signals (e.g., AttBalance, SegVG), and establish better benchmarks (e.g., Intent3D). Also works on improving overall system efficiency (e.g., ACTRESS, 3DResT, INTP-Video-LLM), empowering AI agents (e.g., InfantAgent-Next), and making their decision-making mechanisms more interpretable (e.g., SaCo, TokenTM).
Miscellany
  • - Personal Interests: Not provided