Scholar

Haoxuan You

Google Scholar ID: BhysChMAAAAJ

Apple AI/ML

Computer VisionDeep LearningNLP

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

6,150

H-index

22

i10-index

25

Publications

20

Co-authors

14

list available

Contact

Emailhaoxuanyou@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

10 items

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

2025

Cited

0

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios

2025

Cited

0

HoliTom: Holistic Token Merging for Fast Video Large Language Models

2025

Cited

0

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models

2025

Cited

0

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models

arXiv.org · 2024

Cited

4

MM-Ego: Towards Building Egocentric Multimodal LLMs

arXiv.org · 2024

Cited

12

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images

arXiv.org · 2024

Cited

0

Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

ACM Multimedia · 2024

Cited

0

Resume (English only)

Co-authors

14 total

Professor of Electrical Engineering and Computer Science, Columbia University

Zhecan James Wang

Columbia University, UCLA

Tsinghua University

Yifan Feng 丰一帆

Tsinghua University

Associate Professor, UCLA

Senior Staff Research Lead, Samsung AI / Ex-Deepmind