Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Proposed the Recognize Anything Model (RAM) family, including RAM and RAM++, which outperform OpenAI's CLIP by over 20 points in fine-grained perception and support open-set image tagging.
Developed Tag2Text, a tagging-guided vision-language model enabling simultaneous image tagging and comprehensive captioning (ICLR 2024).
Introduced IDEA, an approach to enhance text diversity in vision-language pre-training via online multi-label recognition (ACM MM 2022).
Proposed simple yet robust loss designs for multi-label learning with missing labels (arXiv 2021).
Developed MGPO (Multi-Turn Grounding-Based Reinforcement Learning), enabling LMMs to iteratively focus on key image regions for high-resolution visual reasoning (arXiv 2025).
The RAM project has garnered 3,200+ GitHub stars and is widely adopted in the open-source community.