Scholar

Fushuo Huo

Google Scholar ID: tcX5RMYAAAAJ

The Hong Kong Polytechnic University

Large Vision Language ModelMultimodal LearningTrustworthy AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

757

H-index

i10-index

Publications

Co-authors

list available

Contact

GitHubOpen ↗

Publications

10 items

UESF-Bench: Benchmarking and Probing for Unified Embodied Seeking and Following

2026

Cited

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

2026

Cited

TimeGuard: Channel-wise Pool Training for Backdoor Defense in Time Series Forecasting

2026

Cited

PrismWF: A Multi-Granularity Patch-Based Transformer for Robust Website Fingerprinting Attack

2026

Cited

Towards Robust Multimodal Learning in the Open World

2025

Cited

Perception, Understanding and Reasoning, A Multimodal Benchmark for Video Fake News Detection

2025

Cited

Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models

2025

Cited

EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models

2025

Cited

Resume (English only)

Academic Achievements

- Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models, ICLR 2025.
- Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection, ECCV 2024.
- C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation, CVPR 2024, Highlight.
- PROCC: Progressive cross-primitive consistency for open-world compositional zero-shot learning, AAAI 2024.
- Non-Exemplar Online Class-incremental Continual Learning via Dual-prototype Self-augment and Refinement, AAAI 2024.
- REQA: Coarse-to-fine Assessment of Image Quality to Alleviate the Range Effect, Journal of Visual Communication and Image Representation 2024.
- UTDNet: A unified triplet decoder network for multimodal salient object detection, Neural Networks 2023.
- Graph knows unknowns: Reformulate zero-shot learning as sample-level graph recognition, AAAI 2023.
- (ML)^2P-Encoder: On Exploration of Channel-Class Correlation for Multi-Label Zero-Shot Learning, CVPR 2023.
- Towards Unbiased Multi-Label Zero-Shot Learning with Pyramid and Semantic Attention, IEEE Transactions on Multimedia 2022.
- Three-stream interaction decoder network for RGB-thermal salient object detection, Knowledge-Based Systems 2022.
- Spatiotemporal regularization correlation filter with response feedback, Journal of Electronic Imaging 2022.
- Real-time One-stream Semantic-guided Refinement Network for RGB-Thermal Salient Object Detection, IEEE Transactions on Instrumentation and Measurement 2022.
- Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection, IEEE Transactions on Circuits and Systems.

Research Experience

- 2024.04-2024.11: Research Intern, supervised by Dr. Zhong Zhang and Peilin Zhao, Tencent AI Lab.

Education

- 2022.09-2025.09 (expected): Ph.D., supervised by Prof. Song Guo and Wenchao Xu, The Hong Kong Polytechnic University.
- 2024.10-2025.09 (expected): Visiting Ph.D., supervised by Prof. Dacheng Tao, Nanyang Technological University.
- 2019.09-2022.06: M.S., supervised by Prof. Xuegui Zhu and Lei Zhang, Chongqing University.
- 2015.09-2019.06: B.S. (Minor in Finance), China University of Mining and Technology.

Background