Xinyu Huang
Scholar

Xinyu Huang

Google Scholar ID: 1O5b3VcAAAAJ
Bytedance Seed
Computer VisionMulti-ModalityVisual Recognition
Citations & Impact
All-time
Citations
1,261
 
H-index
7
 
i10-index
7
 
Publications
11
 
Co-authors
12
list available
Publications
11 items
Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
  • Proposed the Recognize Anything Model (RAM) family, including RAM and RAM++, which outperform OpenAI's CLIP by over 20 points in fine-grained perception and support open-set image tagging.
  • Developed Tag2Text, a tagging-guided vision-language model enabling simultaneous image tagging and comprehensive captioning (ICLR 2024).
  • Introduced IDEA, an approach to enhance text diversity in vision-language pre-training via online multi-label recognition (ACM MM 2022).
  • Proposed simple yet robust loss designs for multi-label learning with missing labels (arXiv 2021).
  • Developed MGPO (Multi-Turn Grounding-Based Reinforcement Learning), enabling LMMs to iteratively focus on key image regions for high-resolution visual reasoning (arXiv 2025).
  • The RAM project has garnered 3,200+ GitHub stars and is widely adopted in the open-source community.