Scholar
Haoxuan You
Google Scholar ID: BhysChMAAAAJ
Apple AI/ML
Computer Vision
Deep Learning
NLP
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
6,150
H-index
22
i10-index
25
Publications
20
Co-authors
14
list available
Contact
Email
haoxuanyou@gmail.com
CV
Open ↗
Twitter
Open ↗
GitHub
Open ↗
LinkedIn
Open ↗
Publications
10 items
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
2025
Cited
0
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
2025
Cited
0
HoliTom: Holistic Token Merging for Fast Video Large Language Models
2025
Cited
0
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
2025
Cited
0
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
arXiv.org · 2024
Cited
4
MM-Ego: Towards Building Egocentric Multimodal LLMs
arXiv.org · 2024
Cited
12
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
arXiv.org · 2024
Cited
0
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
ACM Multimedia · 2024
Cited
0
Load more
Resume (English only)
Co-authors
14 total
Shih-Fu Chang
Professor of Electrical Engineering and Computer Science, Columbia University
Zhecan James Wang
Columbia University, UCLA
Yue Gao
Tsinghua University
Can Qin
Salesforce
Yifan Feng 丰一帆
Tsinghua University
Kai-Wei Chang
Associate Professor, UCLA
Co-author 7
Luowei Zhou
Senior Staff Research Lead, Samsung AI / Ex-Deepmind
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up