Scholar
Fushuo Huo
Google Scholar ID: tcX5RMYAAAAJ
The Hong Kong Polytechnic University
Large Vision Language Model
Multimodal Learning
Trustworthy AI
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
757
H-index
16
i10-index
17
Publications
20
Co-authors
9
list available
Contact
GitHub
Open ↗
Publications
7 items
PrismWF: A Multi-Granularity Patch-Based Transformer for Robust Website Fingerprinting Attack
2026
Cited
0
Towards Robust Multimodal Learning in the Open World
2025
Cited
0
Perception, Understanding and Reasoning, A Multimodal Benchmark for Video Fake News Detection
2025
Cited
0
Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models
2025
Cited
0
EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models
2025
Cited
0
On the Evolution of Federated Post-Training Large Language Models: A Model Accessibility View
2025
Cited
0
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
arXiv.org · 2024
Cited
8
Resume (English only)
Academic Achievements
- Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models, ICLR 2025.
- Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection, ECCV 2024.
- C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation, CVPR 2024, Highlight.
- PROCC: Progressive cross-primitive consistency for open-world compositional zero-shot learning, AAAI 2024.
- Non-Exemplar Online Class-incremental Continual Learning via Dual-prototype Self-augment and Refinement, AAAI 2024.
- REQA: Coarse-to-fine Assessment of Image Quality to Alleviate the Range Effect, Journal of Visual Communication and Image Representation 2024.
- UTDNet: A unified triplet decoder network for multimodal salient object detection, Neural Networks 2023.
- Graph knows unknowns: Reformulate zero-shot learning as sample-level graph recognition, AAAI 2023.
- (ML)^2P-Encoder: On Exploration of Channel-Class Correlation for Multi-Label Zero-Shot Learning, CVPR 2023.
- Towards Unbiased Multi-Label Zero-Shot Learning with Pyramid and Semantic Attention, IEEE Transactions on Multimedia 2022.
- Three-stream interaction decoder network for RGB-thermal salient object detection, Knowledge-Based Systems 2022.
- Spatiotemporal regularization correlation filter with response feedback, Journal of Electronic Imaging 2022.
- Real-time One-stream Semantic-guided Refinement Network for RGB-Thermal Salient Object Detection, IEEE Transactions on Instrumentation and Measurement 2022.
- Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection, IEEE Transactions on Circuits and Systems.
Research Experience
- 2024.04-2024.11: Research Intern, supervised by Dr. Zhong Zhang and Peilin Zhao, Tencent AI Lab.
Education
- 2022.09-2025.09 (expected): Ph.D., supervised by Prof. Song Guo and Wenchao Xu, The Hong Kong Polytechnic University.
- 2024.10-2025.09 (expected): Visiting Ph.D., supervised by Prof. Dacheng Tao, Nanyang Technological University.
- 2019.09-2022.06: M.S., supervised by Prof. Xuegui Zhu and Lei Zhang, Chongqing University.
- 2015.09-2019.06: B.S. (Minor in Finance), China University of Mining and Technology.
Background
Research Interest: Trustworthy AI, Elastic AI, Multimodal Learning, Vision-language Model.
Miscellany
Contact: fushuo.huo AT connect dot polyu dot hk; Phone: +86 18751795276
Other Links: [Google Scholar] [Github] [Wechat]
Co-authors
9 total
Song Guo
Chair Professor of CSE, HKUST
Jingcai Guo
Hong Kong Polytechnic University
Wenchao Xu
Hong Kong University of Science and Technology
Haozhao Wang
Huazhong University of Science and Technology
Lei Zhang
Chongqing University
Qihua Zhou
Shenzhen University
Peilin Zhao (赵沛霖)
Tencent
Dacheng Tao
Nanyang Technological University
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up